A Combined Mixed Integer Programming and Deep Neural Ntwor Assisted Heuristic Algo For The Nurse Rostering Problem

Applied Soft Computing 136 (2023) 109919
Contents lists available at ScienceDirect
Applied Soft Computing

journal homepage: www.elsevier.com/locate/asoc
A combined mixed integer programming and deep neural

network-assisted heuristics algorithm for the nurse rostering
problem✩
∗
Ziyi Chen a,b , Patrick De Causmaecker b , Yajie Dou a ,
a
College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
b
Department of Computer Science, KU Leuven Kulak, E. Sabbelaan 53, 8500 Kortrijk, Belgium
article info a b s t r a c t
Article history: The objective of the nurse rostering problem (NRP) is to obtain a scheduling plan that optimizes the
Received 20 December 2021 allocation of human resources, effectively reducing work pressure on nurses and improving work
Received in revised form 15 October 2022 efficiency and quality. Because various constraints must be considered during scheduling, the NRP
Accepted 3 December 2022
is complicated and known to be NP-hard. Existing research has not combined learning mechanisms
Available online 13 December 2022
with NRP. This study constructively explores the possibility of combining an optimization method
Dataset link: http://www.schedulingbench and a learning mechanism to automatically produce feasible solutions and proposes a feature vector
marks.org/index.html and a reconstruction mechanism to assist in this exploration. We aim to learn a policy that is
generalizable for NRPs of various sizes and design a hybrid algorithm with learning and optimization
Keywords: methods to solve the general NRP. The algorithm has two main parts: a deep neural network (DNN)
Combinatorial optimization improvement part and a reconstruction part. In the DNN improvement part, a feature vector is used
Hybrid algorithm to describe heterogeneous NRP solutions and normalizes these solutions to the same dimension. Then,
Deep neural network
the DNN model determines the best heuristic for approximating the local optimal solution. The method
Nurse rostering problem
Heuristic
reconstructs the structure of the current solution with embedded mixed integer programming (MIP),
quickly escaping the local optimum and enhancing the diversity of the search process, increasing
the likelihood of determining an optimal solution. Different experiments and statistical tests were
conducted by comparing various configurations and approaches. The detailed computational and
statistical results demonstrate the competitive performance of the proposed method.
© 2022 Elsevier B.V. All rights reserved.
1. Introduction In this study, we investigated a combined mixed integer pro-

gramming (MIP) and deep neural network (DNN)-assisted heuris-
The nurse rostering problem (NRP) is a classic real-world tics algorithm (MIP-DNN). The main idea of our work is to learn
combinatorial optimization problem. In practice, high-quality a selection policy that is generalizable for NRPs of various sizes
schedules reduce hospital operating costs and maximize the to guide the decision of the DNN model and selectively recon-
rational use of human resources, both of which contribute to struct local optimal solutions. We constructed a feature vector
improved nurse efficiency and reduced turnover rates [1–3]. From to describe the heterogeneous NRP solutions, which were used
a methodological perspective, it is well known that the NRP is NP- as inputs for the general DNN model. After the initial solution
hard [4–8]. Therefore, it is important to develop an algorithm that was generated, the solution was improved using predetermined
can generate high-quality solutions in short periods of time, not heuristics. During the improvement process, our model deter-
only for the NRP but also for similar types of problems, such as mined which direction to explore next with the learned selec-
airline scheduling, emergency services, and transportation sector tion policy; thus, the local optimal solution could be rapidly
issues. approximated. After a local optimal solution was determined,
the method reconstructed the structure of the current solution
with an embedded MIP solver, thereby quickly leaving the local
✩ This work was supported by the AI in Flanders, KU Leuven Research Fund
optimum and enhancing the diversity of the scheduling, which
[RKU-D2932-C24/17/012]; the National Natural Science Foundation of China
[71901214; 71690233]; and the Data-driven Logistics [FWO-S007318N].
increased the possibility of determining the optimal solution.
∗ Corresponding author. Detailed computational results are presented as a comparison
E-mail address: yajiedou_nudt@163.com (Y. Dou). with other proposed methods.
https://doi.org/10.1016/j.asoc.2022.109919
1568-4946/© 2022 Elsevier B.V. All rights reserved.
Z. Chen, P. De Causmaecker and Y. Dou Applied Soft Computing 136 (2023) 109919
The contributions of this paper are as follows: heuristic ordering with VNS that performed significantly better
than other commercial algorithms. In 2013, based on previous
• The adoption of the feature vector to summarize the so- studies, researchers combined single neighborhoods, transformed
lution characteristics. Our algorithm uses a feature vector
these neighborhoods into more effective compound movements,
to normalize the solutions to the same dimension when
and solved a more complicated NRP [23]. With the neighbor-
faced with solutions of different sizes. With this generalized
hood as the target, [24] intelligently eliminated large neighbor-
problem-solving concept, the DNN model can learn poli-
hood portions, as it was predicted that these neighborhoods
cies from small instances and apply these policies to large
would not improve the objective function. In addition, they eval-
instances; thus, our algorithm is generalizable.
uated real-world data obtained from hospitals in Belgium. How-
• The DNN-assisted mechanism for heuristics selection ever, metaheuristic methods have significant disadvantages, as
during the search process. Other studies have used learning
the shortened time reduces the accuracy.
mechanisms to develop algorithm portfolios, tune parame-
Therefore, each method has advantages and disadvantages,
ters, or select strategies. For instance, [9] proposed an au-
and an increasing number of researchers have attempted to
tomatic technique that constructed portfolios of good algo-
develop hybrid methods. [25] embedded heuristics in their CP
rithm configurations and selected the best configuration for
method and successfully determined a novel solution. This
a given instance. [10] proposed a deep learning parameter
method is based primarily on an exact method but incorporates
optimization approach based on long short-term memory
a heuristic method. [12] combined IP and a set of heuristics us-
neural networks. [11] developed a data-driven framework
ing horizontal exchanges. [26] combined dynamic programming
for designing selection strategies. However, we use the
(DP) and a set of more complex heuristics. These methods are
learning mechanism to dynamically guide the scheduling
based primarily on heuristic methods and are supplemented by
behavior.
exact methods. However, when heuristics methods are used, the
• The embedded MIP solver reconstructs the structure of algorithms constantly iterate heuristics until they fall into a local
the solution and escapes local optima. With a hybrid al-
optimum. Although iterating without selection is possible, we
gorithm, we can use the MIP to change the structure of
risk missing good solutions. [27] introduced a statistical Markov
the local optimal solution, and inspired by [12], we propose
model to select among low-level heuristics, which was also an
a scheme for calculating the cell penalty and determining
attempt to introduce a learning mechanism.
the reconstruction area, which guides the search process to
In addition, our previous work [28] introduced a learning
focus on areas with high-penalty values that require more
model for the NRP. We proposed a method called DNN-assisted
reconstruction.
tree search (DNNTS). The algorithm is based on a tree search
The remainder of this paper is organized as follows. Section 2 and enhanced by a branching heuristic method. The branching
presents a literature review on related work. The problem de- heuristic uses a DNN model to estimate the distance between
scription, constraints, and mathematical model of the NRP are the solution and the optimal solution. The algorithm then uses
introduced in Section 3. The MIP-DNN approach is proposed in the shortest distance according to the estimation to search for
Section 4. The computational experiments and analyses of the the best solution. This multilayer DNN includes an input layer,
results are described in Section 5. The conclusions are presented several hidden layers and an output layer. The DNN is trained on
in Section 6. a set of instances with known solutions. However, because the
DNN model uses the matrix corresponding to the solution as its
2. Related work input, each model can be applied only to instances and variants
with the same size.
2.1. NRP
2.2. Machine learning and discrete optimization
The NRP has been extensively studied in recent decades, es-
pecially in the last decade. The methods for solving the NRP can In other fields, machine learning (ML) approaches for address-
be divided into two main categories: exact methods and meta- ing discrete optimization problems have also advanced consid-
heuristic methods. The exact method can produce the optimal erably. To review the history of neural network development,
solution or a new lower bound. [13] used constraint program- Table 1 provides a comprehensive summary of studies on ML ap-
ming (CP) to solve a set of real-world cases. In 2014, new lower proaches for addressing discrete optimization problems. As early
bounds and optimal solutions for nurse rostering benchmark as 1985, some scholars used a Hopfield neural network (HNN)
instances [14] were determined by [15] via the branch & price to solve the small-scale traveling salesman problem (TSP) [29].
(B&P) and column generation (CG) methods. Voogd also deter- In recent years, an increasing amount of research has focused
mined novel solutions for several nurse rostering benchmark on the application of machine learning to discrete optimization
instances [14] using integer programming (IP) and constraint problems. One approach is end-to-end learning, in which a pre-
modeling [16]. Although exact methods can produce high-quality trained ML model is used to directly solve a problem. The pointer
solutions in some situations, they cannot always solve large prob- network model was proposed as a supervised learning approach
lems in a reasonable amount of time; thus, exact methods are and was used to solve the TSP [30]. However, the pointer network
unsuitable for practical applications. In contrast, metaheuristic model with the supervised learning method requires a large
methods can generate high-quality solutions in a short amount number of labeled datasets of optimal paths, and the upper limit
of time. There are various types of metaheuristic algorithms, of the model performance does not exceed the quality of the
including the well-known greedy algorithm [17], the bee colony labeled solution; thus, this model is more difficult to apply in
algorithm [18], the simulated annealing algorithm [19], and the practice. To address these shortcomings, reinforcement learning
evolutionary algorithm [20], all of which have demonstrated their methods have been proposed and applied. For example, [31]
effectiveness. [21] compared multiple heuristic and metaheuristic used REINFORCE to train the pointer model and introduced a
algorithms, combined heuristic and metaheuristic algorithms, and critic network as a baseline to reduce training variance. Inspired
evaluated the algorithm performance based on two scenarios. by Bello’s work, [32] simplified the pointer network model and
Variable neighborhood search (VNS) and its variants are also extended it to the vehicle routing problem (VRP). [33,34] used the
commonly used methods. [22] proposed a method that combined transformer model as a reference and improved the traditional
2
Table 1
Summary of studies on ML approaches for addressing discrete optimization problems.
Reference Methods Application background (Problem type)
[29] HNN TSP
[30] Pointer network + Supervised learning TSP
[31] Pointer Network + REINFORCE & Critic baseline TSP
[32] Pointer Network + REINFORCE & Critic baseline TSP, VRP
[33] Transformer Attention + REINFORCE & Critic baseline TSP
[34] Transformer Attention + REINFORCE & Rollout baseline TSP, VRP
[35] Graph Pointer Network + HRL TSP
[36] GCN + Supervised learning + Tree search Satisfiability, maximal
independent set,
minimum vertex
cover, maximal clique
[38] GNN + Supervised learning + beam search TSP
[39] GCN + Supervised learning + beam search TSP
[40] MIP + VNS Distributed operating room scheduling
[45] Ant colony algorithm + Heterogeneous guide + Space explosion TSP
[41] MCTS + DNN Production scheduling
[42] Pointer Network + Actor–Critic VRP
[43] Transformer Attention + REINFORCE VRP
[44] DRL + CP TSP, portfolio optimization
pointer network model. A graphical pointer network that com- determine the optimal solution or a new lower bound; how-
bined the pointer network with a graph neural network (GNN) ever, the performance is limited by the size of the instance,
was designed to solve the large-scale TSP and the TSP with and the computing time is usually unacceptable for large-scale
time window constraints [35], and the model was trained by instances. Heuristic methods can generate high-quality feasible
hierarchical reinforcement learning. Since it is often difficult to solutions in a short amount of time; however, the accuracy of
solve problems directly using machine learning, attempts have the solution is reduced. Therefore, some studies have begun to
been made to introduce the concept of learning into the optimiza- consider the learning mechanism. We summarized the research
tion process to more effectively search for the optimal solution. on ML approaches for discrete optimization problems. Research
Based on the probability that the output node of a convolution and applications in this field are still in the preliminary stage; the
network belongs to the optimal solution, a feasible solution can performance is not ideal, and most research on such problems
be constructed by directing the tree search [36]. [37,38] estimated is focused on the TSP and VRP. However, these studies provide
the probability of edge selection using the GNN model and graph new ideas for combining learning and optimization. As research
convolutional network (GCN) model and constructed the final has advanced, an increasing number of well-designed models
solution by introducing beam search. [39] proposed an MIP model have been developed to solve discrete optimization problems.
and designed a multiobjective learning variable neighborhood Compared with the widespread application of ML approaches
search algorithm to address the issue of operating room schedul- to the TSP and VRP, the NRP is still a new field that needs to
ing. To address the slow convergence speed and low solution be explored with ML approaches. The Markov model proposed
accuracy of the traditional ant colony algorithm when faced with in [27] can optimally guide the heuristics, and we have previously
large-scale problems, a space explosion strategy and short-term formally introduced learning into the NRP both in concept and
memory were used to improve the algorithm, which was then ap- in a mode [28]; however, compared with mature general models
plied to the TSP. [40] proposed a heterogeneous guided ant colony that can solve NRPs of different sizes, these methods are not
algorithm. In addition, machine learning has been used to adjust sufficient. Therefore, our motivation for this study is to combine
the metaheuristic solution of discrete optimization problems, a learning model with an optimization method to solve general
such as determining the initial solution and choosing an appropri- problems.
ate heuristic method. In [41], the author used a Monte Carlo tree
search (MCTS) to train a DNN for short-term production schedul- 3. Problem description
ing decisions. [42] proposed NeuRewriter, which improves the
iteration operator by choosing an appropriate heuristic method The goal of the NRP discussed in this study is to minimize the
through reinforcement learning and rewrites local components penalty value while satisfying hard and soft constraints. Nurses
of the current solution; this algorithm performed well in vehicle work in different types of shifts, including early, day, and late
routing problems and online job scheduling problems. Improved shifts. The NRP fixes the number of nurses and the amount of
iterative operators based on reinforcement learning have been time for which the nurses are available. A scheduling plan is
used to optimize the solution of the capacitive vehicle routing required to determine which nurses are assigned to each shift
problem, further enhancing the impact of reinforcement learning each day. This problem has two types of constraints: hard con-
on discrete optimization problems [43]. The development of deep straints, which must be satisfied to obtain a feasible solution,
reinforcement learning (DRL) provides a new direction for study- and soft constraints, which may be violated but are penalized
ing optimization problems. A general and complete solver based in the objective function. The hard constraints (HC1-9) include
on DRL and CP has been introduced to solve complex problems shift rotations, the total working time, weekend working days,
that can be modeled with DP [44]. This work was meaningful consecutive working days, consecutive rest days, and specified
because it was the first time that the learned heuristic algorithm shifts that are not assigned to the specific nurses on specific days.
was embedded directly in the CP solver. The soft constraints (SC) include the upper and lower limits of
the number of nurses assigned to each shift and the individual
2.3. Summary preferences of the nurses.
In the illustrative example described in Fig. 1, the NRP needs
In summary, to address NRP, both exact methods and heuris- to consider the demand of nurses and demand of service together
tic methods have strengths and weaknesses. Exact methods can to obtain a high quality scheduling plan. The demand of nurses
3
Fig. 1. An illustration of the NRP.
includes shift preference, time preference and individual prefer- lt Length of shift type t in minutes.
ence. Shift preference reflects nurse’s requirement about the shift
type, such as the part-time nurse cannot undertake the early shift. mmax
it Maximum number of type t shifts that can be assigned to
Time preference reflects nurse’s requirement about time, such as nurse i.
the minimum and maximum working time, weekend working
days and consecutive working/rest days. Individual preference bmin
i Minimum number of minutes that nurse i must be assigned.
reflects nurse’s specific private requirement, such as nurse 2
does not want to work on Saturday because, for example, it is bmax
i Maximum number of minutes that nurse i can be assigned.
his/her birthday that day. The demand of service includes the
number of different shifts requested per day, we use different cimin Minimum number of consecutive shifts that nurse i must
characters represent different types of shifts, where E represents work.
early shift and L represents late shift. In the table corresponding
to the scheduling plan, each row represents the arrangement cimax Maximum number of consecutive shifts that nurse i can
of a specific nurse during the scheduling period. Each column work.
represents the arrangement of a specific day for all nurses. The
specific shift of each nurse on one day is represented by different omin
i Minimum number of consecutive days off that nurse i can
characters(E/L) filled in the cell. An empty cell indicates that a be assigned.
nurse is not scheduled to work on that day.
We provide a brief mathematical model for the NRP intro- amax Maximum number of weekends that nurse i can work.
i
duced by [46]. Additional information and a more detailed de-
scription can be found in this paper. qidt Penalty if shift type t is not assigned to nurse i on day d.
Indices:
pidt Penalty if shift type t is assigned to nurse i on day d.
i Nurse index set, i ∈ {1, 2, . . . , im }, where im is the number of
nurses. udt Preferred total number of nurses assigned shift type t on day
d.
d Day index set, d ∈ {1, 2, . . . , dm }, where dm is the number of
days. vdt
min
Weight if below the preferred cover for shift type t on day
d.
t Shift type index set, t ∈ {1, 2, . . . , tm }, where tm is the number
of shift types.
vdt
max
Weight if above the preferred cover for shift type t on day
w Weekend index set, w ∈ {1, 2, . . . , dm /7}, where dm /7 is the d.
number of weekends.
Objective function:
Decision variables:
im dm tm im dm tm
xidt 1 if nurse i is assigned shift type t on day d and 0 otherwise ∑ ∑∑ ∑ ∑∑
min qidt (1 − xidt ) + pidt xidt
kiw 1 if nurse i works on weekend w and 0 otherwise. i=1 d=1 t =1 i=1 d=1 t =1
dm tm dm tm
ydt Total below the preferred cover for shift type t on day d.
∑ ∑ ∑ ∑
+ ydt vdt
min
+ zdt vdt
max
. (1)
d=1 t =1 d=1 t =1
zdt Total above the preferred cover for shift type t on day d.
The objective function has four parts: the penalties caused
Parameters: by violating shift-on/off requests and the penalties caused by
Rt Set of shift types that cannot be assigned immediately after exceeding or not reaching the preferred cover, where the shift-
shift type t. on/off request is the request that the specified shift should/should
not be assigned to specified nurse on the specified day. Moreover,
Ni Set of days that nurse i cannot be assigned a shift. the objective function minimizes the overall penalty.
4
Constraint:
4. The proposed MIP-DNN approach
HC1—Nurses cannot be assigned more than one shift in one
day.
The central idea of our framework is that we can train a
tm
∑ DNN model to rank feasible solutions based on their features by
xidt ≤ 1, recording and learning the search order. Then, we can use the MIP
t =1 solver to change the structure and determine a possible direction
∀i ∈ {1, 2, . . . , im } , d ∈ {1, 2, . . . , dm } . (2) for improving the local optima.
A description of the proposed hybrid algorithm is presented in
HC2—Shift Rotation: certain shifts cannot be followed by Algorithm 1. After the initial solution is obtained, a DNN-assisted
other shifts. heuristics algorithm (DNNHeuristics()) uses the DNN model and
xidt + xi(d+1)u ≤ 1, two sets of heuristics to improve the input solution until a certain
number of heuristics have been applied without improvement.
∀i ∈ {1, 2, . . . , im } , d ∈ {1, . . . , dm − 1} , t ∈ {1, 2, . . . , tm } , u ∈ Rt . The best solution is recorded. Then, the solution obtained by the
(3) DNNHeuristics() algorithm is used as the input to the embedded
HC3—Limitation on the maximum number of each shift MIP solver (MIPReconstruction()), which is used to change the
that can be assigned to each nurse. structure on a large scale, improving the objective function value.
MIPReconstruction() destroys and rebuilds the high-penalty areas
dm
∑ of the solution. The reconstruction returns a feasible solution with
it ,
xidt ≤ mmax
a different structure. If the penalty is reduced, the best solution
d=1
is recorded. The obtained solution is then used in the first step
∀i ∈ {1, 2, . . . , im } , t ∈ {1, 2, . . . , tm } . (4) of the algorithm as the input to DNNHeuristics(). Thus, according
to targeted minor changes in DNNHeuristics() and the large-scale
HC4—Minimum and maximum work times.
changes in MIPReconstruction(), the solution is improved until
dm tm
∑ ∑ the stop condition is met.
bmin
i ≤ lt xidt ≤ bmax
i ,
d=1 t =1
Algorithm 1 DNN-MIP(s)
∀i ∈ I . (5)
Require: s
HC5—Maximum number of consecutive shifts. Ensure: bests
d+cimax tm repeat
s ← DNNHeuristics(s, trials)
∑ ∑
xijs ≤ cimax ,
if s < bests then
j=d t =1
bests ← s
∀i ∈ {1, 2, . . . , im } , d ∈ 1, . . . , dm − cimax .
{ }
(6) s ← MIPReconstruction(s)
HC6—Minimum number of consecutive shifts. if s < bests then
bests ← s
tm d+s tm tm
∑ ∑ ∑ ∑ until stop condition met
xidt + (s − xijt ) + xi(d+s+1)t > 0,
return bests
t =1 j=d+1 t =1 t =1
∀i ∈ {1, 2, . . . , im } , s ∈ 1, . . . , cimin , d ∈ {1, . . . , dm − (s + 1)} .

{ }
Our method proceeds in the following steps. First, we train the
(7)
HC7—Minimum number of consecutive days off. DNN model. Then, we use the DNN model to select a method for
tm
∑ d+s
∑ tm
∑ tm
∑ improving the current solution until no further improvements are
(1 − xidt ) + xijt + (1 − xi(d+s+1)t ) > 0, possible. The local optimal solution is input into the MIP solver
t =1 j=d+1 t =1 t =1 to destroy and reconstruct some of the structure using certain
strategies. Next, we comprehensively describe each step of the
∀i ∈ {1, 2, . . . , im } , s ∈ 1, . . . , omin , d ∈ {1, . . . , dm − (s + 1)} .
{ }
i
proposed method.
(8)
HC8—Maximum number of weekends.
tm
∑ tm
∑ 4.1. Model development and training
kiw ≤ xi(7w−1)t + xi(7w)t ≤ 2kiw , ∀i ∈ {1, 2, . . . , im } ,
t =1 t =1 DNNs can be used for classification and regression problems.
w ∈ {1, . . . , dm /7} , The output layer of a classification DNN has a set of neurons, and
the output space consists of a set of discrete values. The output
dm /7
∑ of a regression DNN has only one neuron, and the output is a real
kiw ≤ amax
i , ∀i ∈ {1, 2, . . . , im } . (9) value in the range [0, 1]. Based on this difference between the
w=1 regression DNN and classification DNN, we used a regression DNN
HC9—Days that nurses cannot work. in this study. The DNN is used to evaluate the quality of a feasible
solution, that is, to score feasible solutions. The classification DNN
xidt = 0,
divides the scores into fixed levels, such as a five-level system,
∀i ∈ {1, 2, . . . , im } , d ∈ Ni , t ∈ {1, 2, . . . , tm } . (10) which limits the performance of the DNN. For example, consider
a situation in which solution A is slightly better than solution
SC—Cover requirement.
B but the difference between the two is not large; in this case,
im
∑ the classification DNN may not consider the gap and may assign
xidt − zdt + ydt = udt , the same score to both solutions. Therefore, we need a regression
i=1 DNN, which outputs a continuous value. To enable our model to
∀d ∈ {1, 2, . . . , dm } , t ∈ {1, 2, . . . , tm } . (11) learn a good, feasible solution. We also must construct a dataset
5
Table 2
Description of the handcrafted input features.
Static features Description Count Reference
Ratio. Nurse Ratio of the number of nurses assigned each day to the total number of 4 [47]
nurses (mean, standard deviation, min, and max)
Ratio. PenaltyNurse Ratio of the penalty value caused by each nurse to the current penalty 4 [47]
value (mean, standard deviation, min, and max)
Ratio. PenaltyDay Ratio of the penalty value on each day to the current penalty value (mean, 4
standard deviation, min, and max)
Ratio. Violation Ratio of the number of violations for different constraints to the total 4
number of violations for soft constraints
Dynamic features Description Count Reference
PseudoOp Ratio of the ideal optimal solution to the current best solution based on 1 [11]
the current change
Depth Depth of the node 1 [48]
ConstrDegree Number of constraints involved in the current change 1 [11]
ImpValue Ratio of the improvement to the current best solution 1
with a set of input data and the corresponding output data. In Table 3
this study, we used features and scores as the input and output List of instances for initial training.
of the DNN model, respectively. Instance Count Reference

Instance 1 1124 [12]
Instance 2 1348 [23]
4.1.1. Features and scores Instance 3 2096 [28]
It is challenging to determine whether a feasible solution is Instance 4 3259 [12]
‘‘good’’. We cannot determine this based only on the penalty
value because doing so might result in the algorithm falling into
local optima during the search process. Some feasible solutions
solution. For the NRP, the process of determining the final solu-
may not have a lower penalty; however, they may be improved
tion sm based on the given initial feasible solution s0 for a specific
to obtain the optimal solution. Hence, we used the score as a
instance through predetermined heuristics can be described as
standard for evaluating feasible solutions. A higher score indicates
the following sequence: {(s0 , h0 ), . . . , (sm , hm )}. In this sequence,
that the corresponding solution is more likely to be improved
during step k, solution sk becomes sk +1 through heuristic strategy
to the optimal solution. The score of a feasible solution is the
hk , where m is the total number of steps required to transform
corresponding output obtained by inputting that solution into
feasible solution s0 to sm . For the change in each step, we record
the DNN model. Therefore, we must calculate the score manually
an example (xk , yk ) as training data, where xk = feak , where feak is
as the label of the data in the training set when training the
the feature vector corresponding to solution sk and yk represents
DNN model. The detailed definition of the score is described in
the score of feak , which is defined as
Section 4.1.2.
However, the input to the DNN model should be a type of k p∗
yk = × , (12)
data that can be used to describe the current feasible solution. m pk
One method is to use the solution directly as the input. However, where pk is the penalty value of solution sk and p∗ is the best-
because different specific instances have different scales, a DNN known penalty in this instance.
model trained in this manner can be applied only to specific After the data were collected, we obtained a dataset based
instances. Features can be used to solve this problem, as a set on small benchmark instances, including the features (input) and
of features can accurately describe the state of a feasible solu- scores (output), which can be used to train the DNN model.
tion. Furthermore, because the dimension of a feature is fixed, Table 3 shows the instances, total amount of recorded data, and
our model can learn from small-scale instances and apply this references corresponding to the methods used.
knowledge to solve large-scale instances. The features that were
used are listed in Table 1. We collected commonly used features 4.1.3. Parameter settings and training
that had been verified in previous studies [11,47,48] and decided We used a regression DNN with multiple layers. The parame-
which features to use based on our evaluation. We also designed ter settings of the DNN model are shown in Table 4. The types
some new features related to the NRP and MIP-DNN approaches, of all layers are dense layers, i.e., fully connected layers. The
such as the ratio, violation and ImpValue. All features can be first layer is an input layer with 20 nodes, which is equal to
classified as either static or dynamic. Static features describe the the number of features used in the following section. The middle
state of the current solution, whereas dynamic features describe layers are hidden layers, and the dimensionality is reduced from
specific details in the improvement. We used the conventional 20 to 2 by halving the number of nodes at each step. Each node
min–max normalization method to normalize the value of each in the hidden layer uses an activation function, which is defined
feature to [0, 1] (see Table 2). as ReLU(x) = max {0, x}. The output layer contains only one node.
Its activation function is a sigmoid function, which is defined as
4.1.2. Data collection S(x) = 1/(1 + e−x ), and it can output a value in the range [0, 1]
We constructed a dataset based on previous methods, which as the score.
are listed in column 3 of Table 3. We repeatedly applied these We implement the data collection and training algorithm in
methods to small benchmark instances 1–4 and recorded the Python 3.7 using Keras 2.3.0 with TensorFlow 2.0.0 as the back-
features (input) and corresponding scores (output) during the end for the DNN implementation. During training, the dataset is
search process. The scores depend on the number of heuristic randomly divided into two parts according to the proportion of
changes required for the current solution to become the optimal 80%–20%, of which 80% of the data are the training set, containing
6
Fig. 2. Perturbation heuristics: Heuristics 1–5.
Fig. 3. Perturbation heuristics: Heuristics 6–10.
Table 4 on NRP characteristics, such as the weight of different penalty

List of instances for initial training. values. The 14 heuristics can be divided into two categories.
Type of layer Dimension of input and output Activate function Perturbation heuristics: The first category is perturbation
Dense (20, 16) Relu heuristics. These are heuristics derived from VNS, which change
Dense (16, 8) Relu
parts of the structure and improve the quality of the solution to
Dense (8, 4) Relu
Dense (4, 2) Relu
some extent.
Dense (2, 1) Sigmoid
• H1: Swap 2 shifts between 2 nurses on the same day.
• H2: Swap 2 blocks of 2 consecutive shifts between 2 nurses
on the same day.
6262 pieces of data, and 20% of the data are the test set, con- • H3: Swap 2 blocks of 3 consecutive shifts between 2 nurses
taining 1565 pieces of data. The DNN is trained using the Adam on the same day.
optimizer [49], which is based on gradient descent. The learning • H4: Swap 2 blocks of 4 consecutive shifts between 2 nurses
rate is the default value 0.0001, and the number of epochs is fixed on the same day.
at 100; thus, the batch size is 63. • H5: Swap 2 blocks of 5 consecutive shifts between 2 nurses
on the same day.
4.2. Improvements with the DNN model • H6: Swap 2 shifts on different days for the same nurse.
• H7: Swap 2 blocks of 2 consecutive shifts on different days
After we obtain the trained DNN model and a feasible initial for the same nurse.
solution, we must improve the initial solution and select the • H8: Swap 2 blocks of 3 consecutive shifts on different days
best solution according to different improvements. There are for the same nurse.
several heuristic techniques for altering the current solution and • H9: Swap 2 blocks of 4 consecutive shifts on different days
obtaining variants, such as VNS. However, these techniques are for the same nurse.
insufficient; we need as many different types of heuristics as pos- • H10: Swap 2 blocks of 5 consecutive shifts on different days
sible to generate a relatively large number of candidates to ensure for the same nurse.
that our DNN is more likely to select the best candidate. Hence,
we selected 10 heuristics that have been validated in previous As illustrated in Figs. 2 and 3, these heuristics are useful
studies [12,26]. We also designed some targeted heuristics based for maintaining good blocks, and swapping these blocks can
7
improve the quality of the solution. However, these heuristics

are not targeted improvements, especially heuristics 1–5. Thus,
these heuristics cannot improve whether the cover is insuffi-
cient or excessive because swapping two blocks from the same
columns does not change the number of shifts on the correspond-
ing day. However, these heuristics can reduce violations due to
shift-on/off requests.
Repair heuristics: We want to be more precise and
improve specific constraint violations. The objective function
demonstrates that the weights of different violations play im-
portant roles in determining the objective function value. Ac-
cording to the parameters in the benchmark instances [14], we
infer that insufficient cover has the largest weight. Therefore,
prioritizing this soft constraint and performing corresponding im-
provements to other violations may accelerate the search process.
Fig. 4. Repair heuristics: Heuristics 11–12.
Accordingly, we introduce some novel strategies known as repair
strategies. The following are our repair strategies:
• H11: Check parameters for shift-on requests; assign the re-

quired specified shift to the specified nurse on the specified
day if the parameter is not met.
• H12: Check parameters for shift-off requests; cancel the re-
quired specified shift for the specified nurse on the specified
day if the parameter is not met. A similar shift is added for
the same day for another nurse if it is feasible.
• H13: For all days when the number of shifts is below the
cover, as long as the solution is feasible, add corresponding
shifts for every nurse who has a day off on that day.
Fig. 5. Repair heuristics: Heuristics 13–14.
• H14: For all days when the number of shifts exceeds the
cover, cancel shifts for the corresponding number of excess
nurses for that day. Add the same number of shifts for the
same nurses on other days, if feasible. The DNN model plays a key role in this process. Some ap-
proaches apply multiple heuristics through simple iterations until
the objective function value cannot be further improved. How-
As shown in Figs. 4 and 5, the repair heuristics attempt to ever, this approach may encounter issues if iterating all heuristics
reduce violations associated with insufficient cover while simul- is not purposeful. In each specific step, not all heuristics can
taneously improving other violations. Note that in H12 and H14, be applied, and there may be a considerable number of invalid
when we want to cancel some shifts, we need an additional op- heuristics. Other algorithms implement simple selection poli-
eration. As illustrated in Fig. 4, according to the shift-off request, cies; for example, [50] introduced three move selection strate-
shift E should not be assigned to nurse 3 on Saturday; hence, gies for determining the next neighborhood in their heuristic
we must cancel this shift. However, we must add another shift method. [51] used a weighted tabu search for the selection pro-
E on Saturday for another nurse (e.g., nurse 2) to ensure that cess. However, suboptimal choices may increase the number of
this cancellation does not trigger an insufficient cover violation. iterations or the issue of local optima, reducing the efficiency of
A similar situation is shown in Fig. 5. According to the cover the algorithm. According to the learning mechanism described
request, at least 1 shift must be added on Monday and Sunday; in Section 4.1, a DNN model attempts to make a good choice
hence, we add a new shift E for nurse 2 on Monday and a new during each iteration. DNN-assisted heuristics can increase the
shift E for nurse 1 on Sunday. Because the number of shifts E on likelihood of making good decisions by learning features and
Saturday exceeds the cover, we cancel the shifts and add another increasing the amount of training. If the best solution is not
shift E for the same nurse on Monday because this solution is selected because of inaccuracy or errors, the DNN model can
feasible. We attempt to reduce violations while ensuring that learn to prevent choosing the worst option. Even if the worst
these adjustments do not generate new high-weight violations. heuristic is selected, the DNN model has a remarkable tendency
Algorithm 2 shows the pseudocode for the DNN-assisted to choose good heuristics during the next step. Furthermore, the
heuristic algorithm. Given a feasible initial solution s, the main error is corrected during the next iteration because the DNN
loop is iterated until a number of iterations without improve- model selects the heuristic based on the score, and the model
ments (trialsSet). Then in each iteration, a new solution (s′ ) is has already learned a considerable amount of information during
generated by applying each heuristic (heuristic) to the current training, thus ensuring the accuracy of the score. Therefore, due
solution (currentSol). If s′ is feasible then it is added to the to this logic in the algorithm design, one invalid change causes
set of candidates (CandadeList), next, from s′ is computed its less damage to the solution than the 13 invalid changes during
feature (featureVector) and finally, the score (score) is calcu- the violent iteration.
lated using the DNN (DNN()) and added to the set of scores
(ScoringList). Otherwise, the s′ is discarded. Second, CandidateList
is sorted in decreasing order using ScoringList as a key. Third, if 4.3. Reconstruction with an embedded solver
s′′ , which is the candidate next explored solution (first solution
from CandidateList), better than the current best solution (bestSol) When the repair strategy approaches its limit, a local optimum
then s′′ replaces bestSol, s′′ replaces currentSol, and trials is reset, is reached. In this case, minor changes, such as repair strategies,
otherwise s′′ only replaces currentSol. can no longer improve the current solution because the changes
8
Algorithm 2 DNNHeuristic(s, trialsSet)

Require: s, trialsSet
Ensure: bestSol
currentSol ← s
bestSol ← s
trials ← trialsSet
while trials > 0 do
for heuristic ∈ HeuristicsList do
s′ ← heuristic(currentSol)
if feasible?(s′ ) then
CandidateList ← append(CandidateList , s′ ) Fig. 6. Determining the area to be reconstructed.
featureVector ← getfeatures(s′ )
score ← DNN(featureVector)
ScoringList ← append(ScoringList , score) We follow the description below to determine the area to be
end if reconstructed:
end for
CandidateList ← sort(CandidateList , key = ScoringList) • High-penalty nurse: Nurses with penalty values in the top
s′′ ← first(CandidateList) 30% of the total number of nurses.
if better(s′′ , bestSol) then • High-penalty day: Days with penalty values in the top 30%
bestSol ← s′′ of the total number of days.
currentSol ← s′′ • High-penalty block: Cells with penalty values in the top
trials ← trialsSet 10% are selected, and these cells are used as the center point
else to determine blocks with lengths equal to the maximum
currentSol ← s′′ number of continuous working days. All cells in the block
trials ← trials − 1 are selected as the area to be reconstructed.
end if
end while As illustrated in Fig. 6, all cells in the second row and the
return bestSol first column were chosen because nurse 2 and Monday had the
highest penalty values, respectively. The area with the red central
cell was also chosen because this cell had a penalty value of 150,
which is the largest value in this table.
must be able to change the structure of the matrix. Large-scale The percentage used to select the reconstruction area is based
changes are required to achieve this goal. The current solution is on the experimental results of this study [12]; however, the
input into the MIP solver, and the solver is used to reconstruct specific rules are completely different. For example, in the last
parts of the current solution. A feasible solution with a different rule, we select high-penalty blocks instead of simply changing
structure can be obtained, broadening the search space of the the high-penalty cell. Because the hard constraints are continuous
hybrid method. constraints, the neighborhood of the cell does not improve during
To determine which areas must be reconstructed, [12] con- reconstruction. In the case where most cells are fixed, individual
sidered each cell to be the intersection of a particular day and a changes to certain cells are likely to violate the hard constraints
particular nurse, which can be designated by a value equal to the and produce the original result, which is contrary to our goal of
proportion of that cell in the total number of violations relative making large-scale changes.
to the current solution. We utilize the cell penalty concept but After the area to be reconstructed is determined, we can fix
change the calculation [12] of the cell penalty from the ratio of the values of other areas and use the MIP solver to determine the
the violated constraints to the weighted sum of different penalty solution. It is crucial to quickly obtain a solution with a different
values, which is closer to the final goal of optimization, i.e., the structure when a local optimum is encountered. This goal cannot
sum of all penalty values, and is easier to implement in program- be achieved by iterating multiple small-scale changes or defining
ming. The specific calculation method is presented in Eq. (13). complex blocks. However, if most variables are fixed, the MIP
solver can solve the problem quickly.
pcov er
pcell = pon + poff + , (13)
nE 5. Results
where pcell represents the cell penalty of a specific cell, and
pon and poff represent the penalty values caused by the shift- We apply our algorithm to benchmark instances in shift-
on request and shift-off request soft constraints that the cell scheduling benchmark datasets [14]. These instances are de-
violates, respectively. These two constraints are specific to a signed to reflect real-world requirements and scheduling scenar-
certain nurse on a certain day; hence, there is no need to share ios. Table 5 lists the number of instances, dimensions (weeks,
these constraints with the other cells. pcov er represents the total nurses and shift types) and soft constraints (day off requests,
penalty value of the shift cover soft constraint violated on the shifts on/off and cover). The dimensions of instances 5 to 24
day represented by the cell, which is caused by improper shift range from small (4 weeks, 16 nurses and 2 shift types) to very
arrangements for all nurses on this day. We divide this value by large (52 weeks, 150 nurses and 32 shift types). The numbers
the total number of nurses (nE ) and allocate it to each nurse. of the three different types of requests (columns 5, 6 and 7 in
These cell penalties can be aggregated in different dimensions Table 5) can be regarded as indicators of the difficulty of the
to represent the penalty value caused by a single nurse, the instance. In general, according to our experiments, the greater the
penalty value caused by a single day, or the penalty value caused number of requests is, the more difficult it is to solve the instance,
by different blocks. We can identify the contribution of each with increased computational time and computer memory. In
nurse and each day to the total penalty in the current solution. particular, instances 20 to 24 are computationally challenging.
9
Table 5
Parameters of the benchmark instances.
Instance Weeks Nurses Shift types Day off Shift-on/off Cover request
request request
5 4 16 2 32 106 896
6 4 18 3 36 135 1512
7 4 20 3 40 168 1680
8 4 30 4 60 225 3360
9 4 36 4 72 232 4032
10 4 40 5 80 284 5600
11 4 50 6 100 336 8400
12 4 60 10 120 422 16 800
13 4 120 18 240 841 60 480
14 6 32 4 128 359 5376
15 6 45 6 180 490 11 340
16 8 20 3 120 280 3360
17 8 30 4 160 480 6720
18 12 22 3 176 414 5544
19 12 40 5 320 834 16 800
20 26 50 6 900 2318 54 600
21 26 100 8 1800 4702 145 600
22 52 50 10 1800 4638 18 200
23 52 100 16 3600 9410 582 400
24 52 150 32 5400 13 809 1 747 200
5.1. Experiment I
Table 6
Results of three combinations running for 10 min.
We used three different combinations to evaluate the perfor-
Instance Combination 1 Combination 2 Combination 3
mance of the DNN model and different heuristic strategies.
5 1146 2141 1150
6 1950 1950 1950
7 1056 2401 1082 • Combination 1: Improve with 14 heuristics mentioned
8 2080 1796 1781 above and choose with the DNN model.
9 445 552 442 • Combination 2: Iterate the 14 heuristics until the quality of
10 4632 4735 4634 the solution can no longer be improved without the DNN
11 3545 5184 3443
12 5278 7544 6212
model.
13 3582 29 808 7552 • Combination 3: Improve the solution with the 5 heuristics
14 1896 2330 2118 described in [26] and choose the final solution with the DNN
15 7095 7177 7182 model.
16 4409 7249 4409
17 6910 8186 8294
18 4835 14 522 5967 The first combination is the configuration developed in this
19 9074 8685 9122 study. Although the second combination adopts the same heuris-
20 7406 16 094 10 713 tic set instead of using the DNN model to select the solution, this
21 127 704 381 243 148 989
approach improves the solution through a number of iterations,
22 183 484 421 705 210 625
23 227 384 412 842 279 602 which is used to verify the effectiveness of the DNN model. The
24 7 178 589 1 178 536 17 186 692 third combination simplifies the set of heuristics and uses only
five horizontal swaps [26], which are perturbations guided by the
DNN model, verifying the effectiveness of our multiple heuristics
set.
We tested the algorithm with three experiments. In Experi- We ran the algorithms on benchmark instances 5–24 for
ment I, we used three different combinations and compared the 10 min because instances 1–4 were used as the training datasets.
performance with or without DNN model assistance and with The results of the best penalty found in 10 min are presented in
different heuristic strategies. The first experiment verified that Table 6.
the DNN model improved the proposed algorithm, as well as the As shown in Table 6, Combination 1 performs better than the
effectiveness of our set of 14 heuristic strategies. Experiment II other combinations, except for instance 24. The reason for this
evaluated the performance when a DNN model trained on data result might be that instance 24 has the largest size, and the
generated by the deep Q-network (DQN model) was embedded in connections among the features may be more complicated; thus,
the proposed algorithm to verify the impact of different training it is a challenge to ensure that the correct heuristic is selected for
data acquisition methods on the performance of the DNN model. each choice. Although the solution might have a difficult structure
Experiment III compared the MIP-DNN model with 3 traditional that cannot be explored by using Combinations 1 and 3, the
approaches and 3 hybrid approaches based on the optimal con- problem can be solved by iterating all heuristics. Combination 2
figurations obtained in Experiments I and II and conducted a exhibits optimal performance only in rare cases, and it easily falls
series of statistical tests to verify the performance of the MIP- into local optima due to its purposeless iteration method. Thus,
DNN model and compare its competitiveness to state-of-the-art we introduce the DNN to replace the large number of repeated
approaches. iterations with a more targeted selection. With the DNN model,
We conducted these experiments on a PC (Intel Core-i5 Combination 3 exhibits good performance. However, by using
3.4 GHz with 8 GB RAM) running Windows 7 and implemented more heuristics, Combination 1 has more choices to improve
our algorithm in Python 3.7 using Keras 2.3.0 and TensorFlow the feasible solution. As a result, Combination 1 has a higher
2.0.0 as the backend for the DNN implementation. diversification rate than Combination 2.
10
Algorithm 3 The training process of DQN with experience replay.

Require: Episodes, learning rate α , discount factor γ , Network update interval C .
Ensure: Optimized action-value function Q (θ ).
Initialize replay memory D to capacity N
Initialize action-value function Q (θ ) with random weight θ
Initialize target action-value function Q̂ (θ̂ ) ← θ
repeat
Initialize start state ← features
repeat
Select action with the ϵ -greedy policy
Execute action and obtain rew ard and next state′
Store (state, action, rew ard, state′ ) in D
Sample random (statej , actionj , rew ardj , state′j ) from D
Set:
{
rew ardj if state′j is termination state,
f (x) =
rew ardj + γ maxaction′ Q̂ (state′j , action′ ; θ̂ ) otherwise.
Update θ according to the Adam optimization algorithm
Update state ← state′
Reset the θ̂ ← θ in Q̂ (θ̂ ) for every C steps
until state is termination state
until all episodes are visited
return Q (θ )
5.2. Experiment II We show some of the training data generated with the DQN
model on instances 1–4 in the scatterplot in Fig. 7. The two red
dotted lines in the scatterplot represent the penalty value of the
In this section, we used an unsupervised learning method
best-known solution and the score, which has a maximum of 1.
to train the neural network; specifically, the training data were
The shade of the points represents the number of overlapping
generated by a deep Q-network (DQN) model, which is a state-
points in the area; the more points there are in the area, the
of-the-art deep reinforcement learning model that combines deep
darker the color is. The scatterplot in 7 shows that most of the
learning and reinforcement learning. By generating training data
scores for the training data generated by the DQN model on
with different methods, we can verify the impact of different
instances 1–4 are less than 0.6, and the corresponding penalty
training data acquisition methods on the performance of the DNN
values are considerably worse than the best-known penalty value.
model. To distinguish between different DNN models, the DNN
However, the training data generated by the method described in
model trained by the data obtained with the method described
Section 4.1.2 can obtain feasible solutions with higher scores. For
in Section 4.1.2 is called DNN-H, and the DNN model trained by
more details, please refer to [12,23,28]. Therefore, the two DNN
the data generated with the DQN method is called DNN-Q.
models trained on these two sets of training data have distinct
effects when embedded in the MIP-DNN model; we will analyze
these effects later.
5.2.1. DQN training and generated data
First, we train the DQN model and use the trained DQN model
to generate training data, including features (input) and scores 5.2.2. Comparison and analysis
(output). Then, the training data are used to train the DNN-Q The results of applying the two MIP-DNN algorithms with
model. Our work does not specifically study the application of different embedded DNN models to benchmark instances are
DRL methods to the NRP, so we adopt the classical method to shown in Table 7. Based on the performance of each instance, the
train DQN. The training process is shown in the algorithm 3. MIP-DNN algorithm embedded with the DNN-H model outper-
In the DQN model, we use the feature vector described in forms the MIP-DNN algorithm embedded with the DNN-Q model.
Section 4.1.2 as the state (state). The 14 heuristics mentioned in Because the performance gap is clear, we do not measure other
Section 4.2 are the action set, and each heuristic corresponds to variabilities. The reason why the MIP-DNN algorithms embed-
an action (action). Since the requirement of the reward is that ded with different DNNs perform differently can be inferred by
the larger the better, we take the negative value of the penalty analyzing Fig. 8. This figure shows the variation in the penalty
of the solution corresponding to the feature vector as the reward value over time as these two MIP-DNNs solve instance 10. We
(rew ard). The parameter θ of the Q -network is also trained using choose instance 10 because the initial feasible solution of instance
the Adam optimizer [49]. The learning rate is the default value 10 is 16702, and the best-known solution is 4631. Compared
0.0001, the discount factor γ is 0.9, the ϵ value of the ϵ -greedy with the other instances, the span on the y-axis is smaller, which
policy is 0.8, and the number of epochs is 1000. For specific is convenient for the visual display in the figure. Nevertheless,
details and principles, please refer to [52]. After training the we still must zoom in on some parts of Fig. 8 to show some
DQN model, we run the DQN model on instances 1–4 until the details that are hidden due to scale issues. The middle part of
results converge and repeat 10 times on each instance. During Fig. 8 is the main figure, and the four small figures above and
the training process, the feature vector, the selected heuristic and below this figure are partial enlargements of the main figure.
the corresponding feasible solution of each step are completely The y-axis represents the penalty value, and the x-axis represents
recorded. Finally, we use the data collection method described in the time in seconds. The different colored lines represent the
Section 4.1.2 to process the recorded data and obtain the training optimal solutions of the algorithms embedded with different DNN
data generated from the DQN model. models at various times on the coordinate axis. The different color
11
Fig. 7. Training data generated by the DQN on instances 1–4.
Fig. 8. Variation in the penalty value over time for algorithms with different embedded DNN models on instance 10.
lines (pink and blue) connect the optimal solutions of the two model falls into a local optimum, respectively. The black tri-
MIP-DNNs at different times. angles represent the local optimum determined by the DNN
As previously described, the MIP-DNN algorithm has two improvement part (the input to the MIP solver), and the pink
parts: the DNN improvement part and the MIP reconstruction triangles represent the solution reconstructed by the MIP solver.
part. These components are shown in the figure. The pink and Since the different embedded DNN models operate on only the
blue shaded parts represent the process of reconstructing the DNN improvement part, we mainly focus on the nonshaded
structure of the local optimal solution with the MIP solver after sections of the two lines. Based on the overall performance of
the DNN improvement part assisted by the DNN-Q or DNN-H the two MIP-DNNs within 60 min, at the beginning of solving
12
Table 7 Table 8
Results of the MIP-DNN and classic approaches for a run time of 60 min. Results of the MIP-DNN and classic approaches for a run time of 60 min.
Instance MIP-DNN-H MIP-DNN-Q Instance MIP-DNN Classic approach
Penalty Gap Penalty Gap Ejection chain B&P Gurobi
5 1145 0.17% 1237 8.22% 5 1145 1358 1160 1143
6 1950 0.00% 2048 5.03% 6 1950 2258 1952 1950
7 1082 2.46% 1122 6.25% 7 1082 1269 1058 1056
8 1322 1.69% 1359 4.54% 8 1322 2260 1308 1323
9 440 0.23% 440 0.23% 9 440 463 439 439
10 4631 0.00% 4979 7.51% 10 4631 4797 4631 4631
11 3443 0.00% 7958 131.14% 11 3443 3661 3443 3443
12 4153 2.80% 8879 119.78% 12 4153 5211 4046 4040
13 2769 105.42% 13 255 883.31% 13 2769 2663 / 3109
14 1297 1.49% 1745 36.54% 14 1297 1874 / 1280
15 5920 54.53% 7169 87.13% 15 5920 5935 / 4964
16 3351 3.91% 3552 10.14% 16 3351 4048 3323 3233
17 5748 0.03% 7094 23.46% 17 5743 7835 / 5851
18 4954 11.10% 8247 84.95% 18 4954 6404 / 4760
19 4338 37.76% 15 909 405.21% 19 4338 5531 / 5420
20 5905 23.82% 9399 97.09% 20 5905 9750 / /
21 22 282 5.44% 164 135 676.68% 21 22 282 36 688 / /
22 53 546 77.05% 92 733 206.62% 22 53 546 516 686 / /
23 38 752 122.35% 227 384 1204.71% 23 38 752 54 384 / /
24 402 049 846.82% 3 152 493 7324.09% 24 402 049 156 858 / /
the instance, when the current solution penalty value is high, the these algorithms was reported in the relevant literature to be
MIP-DNN algorithm with the embedded DNN-Q model optimizes 60 min. Therefore, for the convenience of comparison, we set
the solution slightly more than the MIP-DNN algorithm with the our run time to 60 min. Table 8 presents a comparison with an
embedded DNN-H model in the DNN improvement part. When ejection chain metaheuristic, a B&P approach, and a Gurobi solver
the algorithm approaches the best-known solution, although the (with an embedded MIP). These approaches are heuristic or exact
DNN improvement parts of two MIP-DNNs are almost straight methods. All results are reported in [46]. The results show that
lines in the main figure, the enlarged smaller figures show that our hybrid method performs better than classic approaches in
the blue lines remain in a downward trend, while the pink lines terms of generality when solving this problem. Our method per-
are straight lines with the same y-axis coordinates. This result forms significantly better than the ejection chain metaheuristic
illustrates that the MIP-DNN algorithm embedded with the DNN- method on instances 5–23. For instances when the exact methods
H model can improve the solution during the second half of the can produce a feasible solution in a given amount of time, the pro-
solution process; that is, when the penalty value of the current posed algorithm and the other two methods all demonstrate their
solution is close to that of the optimal solution, the algorithm competitiveness, and the best solution obtained by each method
can still improve the solution. However, the MIP-DNN algorithm either reaches or is very close to the proven optimal solution.
embedded with the DNN-Q model cannot effectively improve the However, for larger instances, there are 11 and 5 instances for
solution when the penalty value of the current solution is close to which the B&P and Gurobi methods cannot produce a feasible
that of the optimal solution. The scatterplot in Fig. 7 shows that solution within 60 min. The proposed algorithm is applicable to
this might be due to the impact of the training data generated by all instances.
different methods on the trained DNN model. The training data To further verify the performance of the proposed algorithm,
generated by the DQN are distributed only in the initial stage; the we performed a series of statistical tests and reported variability
training data have higher penalty values and lower scores. The measures. The statistical results of the gaps and ranks of the
training data generated by other methods can produce feasible MIP-DNN, ejection chain, B&P and Gurobi approaches are shown
solutions with higher scores. Therefore, in the second half of the in Table 9, where the gap is the difference between the result
solution process, that is, when the penalty value of the current obtained by the corresponding approach and the best-known
solution is close to that of the optimal solution, the DNN-H model solution and the rank is the ranking of the result obtained by the
can improve the solution more than the DNN-Q model. Further- corresponding approach among the four approaches. Fig. 9 shows
more, we consider the characteristics of the different methods the statistical results of these four approaches as a mixed box and
for generating training data. Although the DQN is a powerful DRL strip plot. Since B&P and Gurobi cannot obtain feasible solutions
model, it is still difficult to explore an NP-hard combinatorial within 60 min on 11 and 5 instances, respectively, if only the
optimization problem with a random strategy; thus, the DQN feasible solution data are used for graphing, the difference in solv-
generates training data with low scores. The methods discussed ing ability of the four approaches within 60 min cannot be fully
in Section 4.1.2 are specifically designed for the NRP; thus, they reflected, which will cause misunderstanding. Therefore, we set
generate higher-quality training data than other methods. The the Gap of the instances that cannot be solved by B&P and Gurobi
purpose of introducing the DNN is to learn policies during the to be the largest 1608.39% in Table 9 and then make the figure for
search process that are reflected by the training data. This might comparison. Due to the excessively large data span (0%–1608%),
be why the MIP-DNN algorithm embedded with the DNN-H we use a broken y-axis to improve the figure presentation. The
model performs better than the MIP-DNN algorithm embedded y-axis is divided into two parts (0%–500% and 1600%–1610%).
with the DNN-Q model. As seen in Fig. 9, first, comparing the results of the four
methods overall, the boxplot obtained by MIP-DNN is narrower
5.3. Experiment III and lower in position, and the scatter is more concentrated, which
means that the gap distribution obtained by MIP-DNN is more
In this section, we compare our algorithm with Combination uniform, and the feasible solution of MIP-DNN is closer to the
1 and the DNN-H model to 6 other algorithms. The run time of
13
Table 9
Results of the MIP-DNN and classic approaches for a run time of 60 min.
Instance MIP-DNN Ejection chain B&P Gurobi
Gap Rank Gap Rank Gap Rank Gap Rank
5 0.17% 2 18.81% 4 1.49% 3 0.00% 1
6 0.00% 1 15.79% 4 0.10% 3 0.00% 1
7 2.46% 3 20.17% 4 0.19% 2 0.00% 1
8 1.69% 2 73.85% 4 0.62% 1 1.77% 3
9 0.23% 3 5.47% 4 0.00% 1 0.00% 1
10 0.00% 1 3.58% 4 0.00% 1 0.00% 1
11 0.00% 1 6.33% 4 0.00% 1 0.00% 1
12 2.80% 3 28.99% 4 0.15% 2 0.00% 1
13 105.42% 2 97.55% 1 / 4 130.61% 3
14 1.49% 2 46.64% 3 / 4 0.16% 1
15 54.53% 2 54.92% 3 / 4 29.57% 1
16 3.91% 3 25.52% 4 3.04% 2 0.25% 1
17 0.03% 1 36.36% 3 / 4 1.83% 2
18 11.10% 2 43.62% 3 / 4 6.75% 1
19 37.76% 1 75.64% 3 / 4 72.12% 2
20 23.82% 1 104.45% 2 / 4 / 4
21 5.44% 1 73.61% 2 / 4 / 4
22 77.05% 1 1608.39% 2 / 4 / 4
23 122.35% 1 212.05% 2 / 4 / 4
24 846.82% 2 269.40% 1 / 4 / 4
best-known solution. More specifically, the median value of the

gaps obtained with the MIP-DNN approach is 4%, which is close to
zero, and the mean value is 65%; compared with the correspond-
ing values of 45% and 141% of the ejection chain, these indexes
are better. For the outliers, those of MIP-DNN are close to the
box, while those of the ejection chain are far from the box. Fig. 9
clearly shows that the variability measures in MIP-DNN are lower
than the corresponding values in the ejection chain, including
the positions of the quartiles, the positions of the maximum and
minimum values, and the position of the mean. Thus, the MIP-
DNN method performs better than the ejection chain method.
This conclusion is reinforced by the distribution of scatters. Most
gap values obtained by the MIP-DNN approach are concentrated
in the range of 0% to 10%, and while a small part is greater than
10%, the data are relatively concentrated. The gap values obtained
by the ejection chain method are evenly dispersed between 0%
and 120%, and the data are relatively discrete. The gap values
of B&P and Gurobi show extreme distributions. The distribution
of feasible solutions is very concentrated, but at the same time, Fig. 9. Statistical tests on the gaps of the classic approaches and MIP-DNN.
there are 11 and 5 instances that cannot be solved within 60 min.
Therefore, the boxes corresponding to B&P and Gurobi are par-
ticularly high, and most corresponding variability measures are points have ranks of 3, and no points have a rank of 4. Only a
also worse. This also reflects the weak solving ability of these small number of ejection chain results have ranks of 1 and 2, and
two approaches on large-scale instances and the limited scope of most results have ranks of 3 and 4. Most B&P and Gurobi results
application. have ranks of 4, which reflects that these two approaches cannot
After the analysis of gap value, we also designed a statistical obtain feasible solutions for some instances. Therefore, according
test for the rank value. The statistical results are shown in Fig. 10, to the above statistical analysis, the overall performance of the
which also includes a mixed box and strip plot. As seen from MIP-DNN method is better and more stable than that of the other
the box plot, the heights of the maximum values, quartile values three approaches.
and mean values in the box plot of the MIP-DNN method are In Table 10, we compare our algorithm with other state-
all significantly lower than those of the other three approaches, of-the-art hybrid methods, including a hybrid CP and iterated
which illustrates that the MIP-DNN approach is superior to the local search (CP-ILS) approach, a hybrid DP-VNS approach, and
other three approaches in terms of the variability of the rank. a hybrid IP-VNS approach. The first two methods are based on
Furthermore, the MIP-DNN approach has the smallest range be- NRP. The CP-ILS approach is based on the Sudoku problem [53];
tween the maximum and minimum values, indicating that the however, the author also extended this approach to the NRP
MIP-DNN method performs better and is more stable than the benchmark instances, so we included this method in Table 10
other methods. The distribution of scatters in 10 confirms this for comparison. The CP-ILS results were obtained from [53]. The
conclusion from another perspective. Since the rank is a discrete DP-VNS results were obtained from [26]. The IP-VNS results were
variable and there are only four possible discrete values of 1, obtained from [12]. As shown in Table 8, the proposed algorithm
2, 3 and 4, there are many overlapping scatter points; thus, the outperformed the CP-ILS approach on 16 instances. Furthermore,
scatter points in the strip plot are transparent to present more the proposed algorithm outperformed the DP-VNS method on
information. The more points that overlap, the darker the points 14 instances. There are 20 instances in total; thus, our method
are. The strip plot in 10 shows that the most scatter points in outperformed the CP-ILS and DP-VNS in 80% and 70% of the
the MIP-DNN plot overlap when the rank is 1 and 2; only four instances, respectively. From the data in Table 10 alone, we can
14
Fig. 11. Statistical test on the gaps of the hybrid approaches and MIP-DNN.
Fig. 10. Statistical test on the ranks of the classic approaches and MIP-DNN.
Table 10
Results for the MIP-DNN and hybrid approaches run for 60 min and the
best-known solutions.
Instance MIP-DNN Hybrid approach Best known
solution
CP-ILS DP-VNS IP-VNS
5 1145 1147 1237 1143 1143
6 1950 2050 2141 1950 1950
7 1082 1084 1080 1056 1056
8 1322 1464 1452 1344 1300
9 440 454 446 439 439
10 4631 4667 4656 4631 4631
11 3443 3457 3512 3443 3443
12 4153 4308 4119 4040 4040
13 2769 2961 2120 1905 1348
14 1297 1432 1344 1279 1278
15 5920 4570 4637 3928 3831
16 3351 3748 3458 3225 3225
17 5748 6609 6190 5750 5746
18 4954 5416 5095 4662 4459
19 4338 4364 4281 3224 3149
20 5905 6654 7274 4913 4769
21 22 282 22 549 26 263 23 191 21 133
Fig. 12. Statistical test on the ranks of the hybrid approaches and MIP-DNN.
22 53 546 48 382 56 091 32 126 30 244
23 38 752 38 337 51 699 3794 17 428
24 402 049 177 037 226 490 2 281 440 42 463
that the box of the MIP-DNN approach is larger, the upper edge
is higher, and the lower edge is lower. This illustrates that the
data distribution of the MIP-DNN method is more downwardly
infer that the MIP-DNN approach performs better than the above
concentrated before the median value, and the quality of this part
two approaches.
of the solution is higher, but the fluctuation after the median
In addition, we evaluated the MIP-DNN, CP-ILS, DP-VNS and
value is larger. The distribution of scatters in 11 proves the
IP-VNS approaches according to their variability. The statistical
analysis above more intuitively. It shows that most gap values
results of the gaps and ranks of the four approaches are shown
obtained with the MIP-DNN method are concentrated between
in Table 11. Fig. 11 shows the statistical results of these four
0% and 10%, indicating that the gap values of high-quality feasible
approaches as a mixed box and strip plot.
solutions obtained by the MIP-DNN approach are well concen-
Since the gap values of these four approaches are relatively
trated; however, the remaining data are distributed between 10%
concentrated, we limit the y-axis to range from 0% to 100% to
and 140%, which is more volatile than the other two approaches.
present the figure better than a broken y-axis. There are 9 gaps
The gap values are uniformly dispersed between 0% and 30% for
that could not be shown in the figure. Figs. 11 and 12 clearly show
the CP-ILS approach and between 0% and 100% for the DP-VNS
that the IP-VNS approach outperforms the other three approaches
approach; although the data distribution of CP-ILS and DP-VNS
on every variability metric. Therefore, we compare the MIP-DNN
is relatively concentrated, there are obviously fewer high-quality
and IP-VNS approaches separately in the next paragraph. Here,
solutions. Therefore, we cannot judge which method performs
we first evaluate the MIP-DNN, CP-ILS and DP-VNS approaches.
better based on Fig. 11. We consider the statistical results of the
As seen from the box plot in Fig. 11, the median value, minimum
rank value. As previously mentioned, we make the scatter points
value and first quartile value of the gaps obtained with the MIP-
in the strip plot transparent 12. As seen from the box plot in 12,
DNN approach are less than those obtained with the other two
the heights of the quartile values and mean value in the box plot
approaches; however, the mean value, maximum value and third
of the MIP-DNN approach are lower than those of the CP-ILS and
quartile value are larger. Therefore, the performance in Fig. 11 is
DP-VNS approaches. Moreover, in the scatters in 12, most results
15
Table 11
Results of the MIP-DNN and classic approaches for a run time of 60 min.
Instance MIP-DNN CP-ILS DP-VNS IP-VNS
Gap Rank Gap Rank Gap Rank Gap Rank
5 0.17% 2 0.35% 3 8.22% 4 0.00% 1
6 0.00% 1 5.13% 3 9.79% 4 0.00% 1
7 2.46% 3 2.65% 4 2.27% 2 0.00% 1
8 1.69% 1 12.62% 4 11.69% 3 3.38% 2
9 0.23% 2 3.42% 4 1.59% 3 0.00% 1
10 0.00% 1 0.78% 4 0.54% 3 0.00% 1
11 0.00% 1 0.41% 3 2.00% 4 0.00% 1
12 2.80% 3 6.63% 4 1.96% 2 0.00% 1
13 105.42% 3 119.66% 4 57.27% 2 41.32% 1
14 1.49% 2 12.05% 4 5.16% 3 0.08% 1
15 54.53% 4 19.29% 2 21.04% 3 2.53% 1
16 3.91% 2 16.22% 4 7.22% 3 0.00% 1
17 0.03% 1 15.02% 4 7.73% 3 0.07% 2
18 11.10% 2 21.46% 4 14.26% 3 4.55% 1
19 37.76% 3 38.58% 4 35.95% 2 2.38% 1
20 23.82% 2 39.53% 3 52.53% 4 3.02% 1
21 5.44% 1 6.70% 2 24.27% 4 9.74% 3
22 77.05% 3 59.97% 2 85.46% 4 6.22% 1
23 122.35% 3 119.97% 2 196.64% 4 −78.23% 1
24 846.82% 3 316.92% 1 433.38% 2 5272.77% 4
of the MIP-DNN method have ranks of 1, 2, and 3; six results falls into local optima (in the variable neighborhood descent +
have a rank of 1, while only one result has a rank of 4. For the IP ruin-and-recreate iteration) but also proves the optimality of
CP-ILS and DP-VNS approaches, the scatter points that overlap the solution or further improves the solution globally after the
the most have ranks of 3 and 4. The boxes in 12 and the scatters iteration has finished. Thus, of the three components in the IP-
in 12 illustrate that the MIP-DNN approach outperforms the CP- VNS approach, the IP solver dominates; therefore, the overall
ILS and DP-VNS approaches in terms of rank. Therefore, based IP-VNS approach is more likely to produce optimal solutions
on the above statistical analysis, we can conclude that from the for small and medium-sized instances. Our algorithm depends
perspective of the quality of independent solutions, the MIP-DNN more on heuristics, as well as the knowledge learned by the
approach is not significantly different from the CP-ILS and DP- DNN model, so it produces near-optimal solutions in most cases.
VNS approaches; however, from the ranks on all the instances, Thus, in most instances, the IP-VNS approach performs better
the MIP-DNN approach is the most stable. than the MIP-DNN method because the IP solver in the IP-VNS
According to the results of the comparison with the IP-VNS approach solves IP problems in a more reasonable amount of
approach, the proposed algorithm produced better results than time. Therefore, the IP-VNS approach has superior performance
the IP-VNS method on instances 8, 17, 21 and 24. However, as long as the IP solver can reasonably handle the subproblem
among these better-performing instances, the quality of the result size. However, once the size of the instance exceeds the ability
obtained with the proposed algorithm significantly outperforms of the IP solver, the IP-VNS approach shows poor performance,
the IP-VNS result only for instance 24. For instances 8, 17 and as observed for instance 24, which is the largest instance. As
21, the gaps between the results obtained by the two algorithms shown in Table 11, the gap between the result obtained by the
and the best-known solution are 1.6% and 3.4%, 0.03% and 0.06%, IP-VNS approach on instance 24 and the best-known solution is
and 5% and 10%, respectively. Thus, the advantage of the proposed large, reaching 5272.77%. As the number of nurses, the length of
algorithm is not as clear. scheduling time and the types of shifts increase, the size of the
The differences in the results obtained by these two algorithms main problem becomes increasingly large. In this case, the IP-
can be attributed to the different roles played by the heuristic part VNS approach cannot solve the problem exactly in a reasonable
and the embedded solver in each algorithm. First, we consider amount of time, demonstrating the advantage of our algorithm.
the differences in the heuristic parts. The proposed algorithm For large instances, the complexity of applying a simple hori-
integrated a variety of heuristics, including 10 horizontal and zontal or vertical exchange (H1–H10) for the solution does not
vertical exchanges and 4 targeted repair heuristics, to improve increase as the size of the instance increases. The complexities of
the quality of the solution under the guidance of the DNN. The the repair heuristics (H11–H14) increase with increasing instance
IP-VNS approach uses only five vertical exchanges in the heuristic size; however, the time complexity is only O(n). In summary, our
part. According to the characteristics of the NRP, vertical ex- algorithm can handle large instances that cannot be solved in a
changes have a negligible improvement on the quality of the reasonable amount of time by exact methods and thus produces
solution because the penalty value is primarily generated by better solutions than these methods.
cover violations. This result can explain the marginally superior In addition, in the last column of Table 10, we provide the
performance of the proposed algorithm on instances 8, 17 and 21, best-known solutions collected from shift-scheduling benchmark
as the proposed algorithm has more options and can thus explore datasets [14]. The underline indicates that the best-known solu-
more possibilities under the guidance of DNN; thus, our proposed tion is equal to the best-known lower bound. These best-known
algorithm can better improve the current solution. The vertical solutions and lower bounds were determined with state-of-the-
exchange method of the IP-VNS heuristic part adjusts only the art exact methods such as CG [25,54], B&P [15,55], and IP [56].
shift-on/off request constraint, which accounts for only a small As shown in the table, compared with the best-known solution
portion of the overall penalty value. obtained by exact methods, the gap between the MIP-DNN ap-
Next, we concentrate on how the different embedded exact proach and the exact methods is still small. The gap increases
solvers operate. According to the literature [12], the IP solver in only for large instances, such as instances 22, 23, and 24. For
the IP-VNS approach plays an important role in the solution pro- instances 6, 10, and 11, the MIP-DNN algorithm can determine
cess. The solver not only adjusts the structure after the algorithm the solution with the same penalty value as the best-known lower
16
bound. Notably, however, the time cost of the exact methods is Data availability
often unacceptable. Therefore, considering the time costs and the
quality of the solution, the MIP-DNN algorithm is competitive The data is benchmark data which is available at http://www.
with state-of-the-art methods. schedulingbenchmarks.org/index.html.
Acknowledgments
6. Conclusion
We thank the editor and the reviewers for their valuable com-
We proposed a method that combines learning with a schedul- ments and detailed suggestions for improving the presentation
ing process. Our method uses deep learning for branch selection of this paper. Furthermore, we acknowledge support from the AI
in a tree search. In addition, we proposed a general set of feature in Flanders, KU Leuven Research Fund [RKU-D2932-C24/17/012],
vectors to describe different size instances, used a DNN model the National Natural Science Foundation of China
to learn these features, and trained the DNN model on four small [71901214; 71690233] and the Data-driven logistics, Belgium
instances. Furthermore, a set of perturbation and repair heuristics [FWO-S007318N].
was used to improve the initial solution and make a choice
based on the decision of the DNN model during each step of the References
tree search. When the tree search fell into a local optimum, we
reconstructed part of the current solution using an embedded MIP [1] E.K. Burke, P. De Causmaecker, G. Vanden Berghe, H.V. Landeghem, The
solver according to some rules. By using the remainder of the state of the art of nurse rostering, J. Sched. 7 (6) (2004) 441–499.
[2] G. Kazahaya, Harnessing technology to redesign labor cost management
input as the initial solution of the MIP solver, we significantly
reports: Labor costs typically represent over 50 percent of a hospital’s total
accelerated the convergence speed of the solver and helped the operating expenses. Can the data management process be harnessed to
algorithm quickly escape the local optimum. create meaningful labor cost management tools? Healthc. Financ. Manag.
The novelty of our approach is reflected in the following three 59 (4) (2005) 94–101.
points: [3] R. M’Hallah, A. Alkhabbaz, Scheduling of nurses: A case study of a Kuwaiti
health care unit, Oper. Res. Health Care 2 (1–2) (2013) 1–19.
• The adoption of the feature vector. [4] P. De Causmaecker, G. Vanden Berghe, A categorisation of nurse rostering
problems, J. Sched. 14 (1) (2011) 3–16.
• The DNN-assisted mechanism. [5] H.C. Lau, On the complexity of manpower shift scheduling, Comput. Oper.
• The embedded MIP solver. Res. 23 (1) (1996) 93–102.
[6] P. Smet, P. Brucker, P. De Causmaecker, G. Vanden Berghe, Polynomially
The feature vector can be used to summarize the solution solvable personnel rostering problems, European J. Oper. Res. 249 (1)
characteristics and describe the heterogeneous NRP solutions as (2016) 67–75.
a single vector. The DNN-assisted mechanism can dynamically [7] C. Li, P. Smet, P. De Causmaecker, Hierarchical constraints and their
applications in staff scheduling problems, in: Proceedings of the 13th
guide the scheduling behavior. The embedded MIP solver can
International Conference on the Practice and Theory of Automated
reconstruct the structure of the solution and escape local optima. Timetabling-PATAT, Vol. 1, 2021, pp. 24–33.
We evaluated different combinations (with and without the [8] M. Paul, S. Knust, A classification scheme for integrated staff rostering and
DNN model) using benchmark instances. Combination 1 with scheduling problems, RAIRO-Oper. Res. 49 (2) (2015) 393–412.
the DNN model performed better on most instances than the [9] L. Xu, F. Hutter, H.H. Hoos, K. Leyton-Brown, Hydra-MIP: Automated
combination without the DNN model, showing the added value algorithm configuration and selection for mixed integer programming, in:
International Joint Conference on Artificial Intelligence, IJCAI, 2011, pp.
of features and the DNN model. Thus, we demonstrated how
16–30.
existing research and expert knowledge can be combined with [10] H.-P. Nguyen, J. Liu, E. Zio, A long-term prediction approach based on long
deep learning. Furthermore, we evaluated the performance of short-term memory neural networks with automatic parameter optimiza-
algorithms embedded with different DNN models trained on data tion by tree-structured Parzen estimator and applied to time-series data
obtained from heuristic methods and the DQN method. The re- of NPP steam generators, Appl. Soft Comput. 89 (2020) 106116.
[11] E. Khalil, P. Le Bodic, L. Song, G. Nemhauser, B. Dilkina, Learning to branch
sults showed that the DNN model trained on data obtained from
in mixed integer programming, in: Proceedings of the AAAI Conference on
heuristic methods performed better than the DNN model trained Artificial Intelligence, 2016, pp. 724–731.
on data obtained from the DQN method. The experimental results [12] E. Rahimian, K. Akartunalı, J. Levine, A hybrid integer programming
show that by applying the optimal configurations to the MIP-DNN and variable neighbourhood search algorithm to solve nurse rostering
model and testing on benchmark instances, our algorithm can problems, European J. Oper. Res. 258 (2) (2017) 411–423.
produce good solutions and has competitive performance com- [13] R. Soto, B. Crawford, E. Monfroy, W. Palma, F. Paredes, Nurse and
paramedic rostering with constraint programming: A case study, Roman.
pared with other advanced algorithms. The proposed algorithm
J. Inf. Sci. Technol. 16 (1) (2013) 52–64.
can solve problems with very little user input and only four small [14] Schedulingbenchmarks.org, Shift scheduling benchmark instances, 2006,
instances. http://www.schedulingbenchmarks.org/index.html. (Accessed 5 December
2020).
[15] E.K. Burke, T. Curtois, New approaches to nurse rostering benchmark
CRediT authorship contribution statement instances, European J. Oper. Res. 237 (1) (2014) 71–81.
[16] P.d. Voogd, Real-life workforce scheduling: Constraint modeling, 2020,
Erasmus University, URL http://hdl.handle.net/2105/51690.
Ziyi Chen: Writing – original draft, Methodology, Software, Vi-
[17] M. Jamom, M. Ayob, M. Hadwan, A greedy constructive approach for
sualization. Patrick De Causmaecker: Funding acquisition, Writ- Nurse Rostering Problem, in: 2011 3rd Conference on Data Mining and
ing – review & editing. Yajie Dou: Funding acquisition, Writing – Optimization, DMO, 2011, pp. 227–231.
review & editing, Supervision. [18] A.M. A., B. Asaju, L. Al-Betar, M. Azmi, A hybrid artificial bee colony for a
nurse rostering problem, Appl. Soft Comput. (2015).
[19] Z. Liu, Z. Liu, Z. Zhu, Y. Shen, J. Dong, Simulated annealing for a multi-
Declaration of competing interest level nurse rostering problem in hemodialysis service, Appl. Soft Comput.
(2018).
[20] C. Rae, N. Pillay, A preliminary study into the use of an evolutionary
The authors declare that they have no known competing finan- algorithm hyper-heuristic to solve the nurse rostering problem, in: 2012
cial interests or personal relationships that could have appeared Fourth World Congress on Nature and Biologically Inspired Computing,
to influence the work reported in this paper. NaBIC, 2012, pp. 156–161.
17
[21] P.S. Chen, Z.Y. Zeng, Developing two heuristic algorithms with metaheuris- [38] C.K. Joshi, T. Laurent, X. Bresson, An efficient graph convolutional network
tic algorithms to improve solutions of optimization problems with soft and technique for the travelling salesman problem, 2019, arXiv preprint, arXiv:
hard constraints: An application to nurse rostering problems, Appl. Soft 1906.01227.
Comput. 93 (2020). [39] M. Lotfi, J. Behnamian, Collaborative scheduling of operating room in
[22] E.K. Burke, T. Curtois, G. Post, R. Qu, B. Veltman, A hybrid heuristic ordering hospital network: Multi-objective learning variable neighborhood search,
and variable neighbourhood search for the nurse rostering problem, Appl. Soft Comput. 116 (2022) 108233.
European J. Oper. Res. 188 (2) (2008) 330–341. [40] J. Yu, X. You, S. Liu, A heterogeneous guided ant colony algorithm based on
[23] E.K. Burke, T. Curtois, R. Qu, G. Vanden Berghe, A time predefined variable space explosion and long–short memory, Appl. Soft Comput. 113 (2021)
depth search for nurse rostering, INFORMS J. Comput. 25 (3) (2013) 107991.
411–419. [41] A. Kumar, R. Dimitrakopoulos, Production scheduling in industrial mining
[24] N. Todorovic, S. Petrovic, Bee colony optimization algorithm for nurse complexes with incoming new information using tree search and deep
rostering, IEEE Trans. Syst. Man Cybern.: Syst. 43 (2) (2013) 467–473. reinforcement learning, Appl. Soft Comput. 110 (2021) 107644.
[25] P. Strandmark, Y. Qu, T. Curtois, First-order linear programming in a col- [42] X. Chen, Y. Tian, Learning to perform local rewriting for combinatorial
umn generation-based heuristic approach to the nurse rostering problem, optimization, Adv. Neural Inf. Process. Syst. 32 (2019).
Comput. Oper. Res. 120 (2020) 104945. [43] H. Lu, X. Zhang, S. Yang, A learning-based iterative method for solv-
[26] M. Abdelghany, A.B. Eltawil, Z. Yahia, K. Nakata, A hybrid variable ing vehicle routing problems, in: International Conference on Learning
neighbourhood search and dynamic programming approach for the nurse Representations, 2019.
rostering problem, J. Ind. Manag. Optim. 17 (4) (2021) 2051. [44] Q. Cappart, T. Moisan, L.-M. Rousseau, I. Prémont-Schwarz, A. Cire, Combin-
[27] A. Kheiri, A. Gretsista, E. Keedwell, G. Lulli, M.G. Epitropakis, E.K. Burke, ing reinforcement learning and constraint programming for combinatorial
A hyper-heuristic approach based upon a hidden Markov model for optimization, 2020, arXiv preprint, arXiv:2006.01610.
the multi-stage nurse rostering problem, Comput. Oper. Res. 130 (2021) [45] J.-j. Wu, Y. Lin, Z.-h. Zhan, W.-n. Chen, Y.-b. Lin, J.-y. Chen, An ant
105221. colony optimization approach for nurse rostering problem, in: 2013 IEEE
[28] Z. Chen, P. De Causmaecker, Y. Dou, Neural networked assisted tree International Conference on Systems, Man, and Cybernetics, 2013, pp.
search for the personnel rostering problem, 2020, arXiv Preprint, arXiv: 1672–1676.
2010.14252. [46] T. Curtois, R. Qu, Computational Results on New Staff Scheduling
[29] H. John J, T. D. W, ‘‘Neural’’ computation of decisions in optimization Benchmark Instances, Technical Report, ASAP Research Group, School of
problems, Biol. Cybern. 52 (3) (1985) 141–152. Computer Science, University of Nottingham, 2014.
[30] O. Vinyals, M. Fortunato, N. Jaitly, Pointer networks, in: Advances in Neural [47] T. Messelis, P. De Causmaecker, An NRP feature set, 2010.
Information Processing Systems, Vol. 28, 2015. [48] G. Zarpellon, J. Jo, A. Lodi, Y. Bengio, Parameterizing branch-and-bound
[31] I. Bello, H. Pham, Q.V. Le, M. Norouzi, S. Bengio, Neural combinatorial search trees to learn branching policies, 2020, arXiv preprint, arXiv:2002.
optimization with reinforcement learning, 2016, arXiv Preprint, arXiv: 05120.
1611.09940. [49] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014, arXiv
[32] M. Nazari, A. Oroojlooy, L.V. Snyder, M. Takác, Reinforcement learning for preprint, arXiv:1412.6980.
solving the vehicle routing problem, in: Advances in Neural Information [50] Z. Lü, J.-K. Hao, Adaptive Tabu search for course timetabling, European J.
Processing Systems, Vol. 31, 2018. Oper. Res. 200 (1) (2010) 235–244.
[33] M. Deudon, P. Cournut, A. Lacoste, Y. Adulyasak, L.-M. Rousseau, Learning [51] Z. SU, Z. Wang, Z. Lv, et al., Weighted tabu search for multi-stage nurse
heuristics for the TSP by policy gradient, in: Integration of Constraint rostering problem, Scientia Sinica Informationis 46 (7) (2016) 834–854.
Programming, Artificial Intelligence, and Operations Research, Cham, 2018, [52] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare,
pp. 170–181. A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, et al., Human-level
[34] W. Kool, H. van Hoof, M. Welling, Attention, learn to solve routing control through deep reinforcement learning, Nature 518 (7540) (2015)
problems! in: Proceedings of the 7th International Conference on Learning 529–533.
Representations, 2019. [53] N. Musliu, F. Winter, A hybrid approach for the sudoku problem: Using
[35] M. Qiang, G. Suwen, H. Danyang, T. Darshan, D. Iddo, Combinatorial constraint programming in iterated local search, IEEE Intell. Syst. 32 (2)
optimization by graph pointer networks and hierarchical reinforcement (2017) 52–62.
learning, 2019, arXiv Preprint, arXiv:1911.04936. [54] Tak.Sugawara, What is schedule nurse III, 2021, https://nurse-scheduling-
[36] L. Zhuwen, C. Qifeng, K. Vladlen, Combinatorial optimization with graph software.com/posts/benchmarks/. (Accessed 23 July 2021).
convolutional networks and guided tree search, in: Advances in Neural [55] P. Strandmark, Shift scheduling benchmark, 2016, https://strandmark.
Information Processing Systems, Vol. 31, 2018. wordpress.com/2016/10/09/shift-scheduling-benchmark/. (Accessed 23
[37] A. Nowak, S. Villar, A.S. Bandeira, J. Bruna, A note on learning algorithms July 2021).
for quadratic assignment with graph neural networks, in: Proceeding of [56] P. Smet, A.T. Ernst, G. Vanden Berghe, Heuristic decomposition approaches
the 34th International Conference on Machine Learning, Vol. 1050, ICML, for an integrated task scheduling and personnel rostering problem,
2017, p. 22. Comput. Oper. Res. 76 (2016) 60–72.
18

A Combined Mixed Integer Programming and Deep Neural Ntwor Assisted Heuristic Algo For The Nurse Rostering Problem

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Combined Mixed Integer Programming and Deep Neural Ntwor Assisted Heuristic Algo For The Nurse Rostering Problem

Uploaded by

Copyright:

Available Formats

Applied Soft Computing 136 (2023) 109919

Contents lists available at ScienceDirect

Applied Soft Computing

A combined mixed integer programming and deep neural

1. Introduction In this study, we investigated a combined mixed integer pro-

Fig. 1. An illustration of the NRP.

∀i ∈ {1, 2, . . . , im } , s ∈ 1, . . . , cimin , d ∈ {1, . . . , dm − (s + 1)} .

of the DNN model, respectively. Instance Count Reference

Fig. 2. Perturbation heuristics: Heuristics 1–5.

Fig. 3. Perturbation heuristics: Heuristics 6–10.

Table 4 on NRP characteristics, such as the weight of different penalty

improve the quality of the solution. However, these heuristics

• H11: Check parameters for shift-on requests; assign the re-

Algorithm 2 DNNHeuristic(s, trialsSet)

Algorithm 3 The training process of DQN with experience replay.

Fig. 7. Training data generated by the DQN on instances 1–4.

best-known solution. More specifically, the median value of the

You might also like