You are on page 1of 11

Expert Systems With Applications 209 (2022) 118207

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

An expert system to react to defective areas in nesting problems


Petra Maria Bartmeyer a ,∗, Larissa Tebaldi Oliveira a , Aline Aparecida Souza Leão b , Franklina
Maria Bragion Toledo a
a Instituto de Ciências Matemáticas e de Computação - ICMC, Universidade de São Paulo (USP), 13560-970, São Carlos-SP, Brazil
b Departamento de Matemática, Universidade Estadual de Londrina, Cx. Postal 10.011, 86.057-970, Londrina, PR, Brazil

ARTICLE INFO ABSTRACT

Keywords: Production plans in the textile industry, and other practical applications, involve cutting irregular pieces from
Nesting problem raw materials. Defective areas in the raw material may be detected during the cutting process, requiring an
Strip-packing problem adaptation of the original layout. The response time to provide an alternative layout is short, precluding
Reinforcement learning
the use of exact methods to overcome defective areas. The main contribution of this paper is to provide an
Transfer learning
expert system to quickly obtain an alternative layout, overcoming defective areas in the object. The expert
Heuristic
system comprises a greedy heuristic based on the allocation sequence suggested by reinforcement learning.
Computational experiments have two main objectives. The first one is to validate reinforcement learning as a
suitable strategy to tackle nesting problems. The results attest to the ability of the strategy to achieve the best
results in the literature. The second objective is to show the ability of the expert system to provide alternative
layouts within a short response time. The quality of the solutions obtained by the expert system evidence the
strength of the proposed system in overcoming defective areas.

1. Introduction continuous (Cherri et al., 2018) and discrete (Toledo et al., 2013)
representations of the pieces and the object. Exact methods are not
Packing problems consist of allocating a set of pieces into a single or scalable, making the nesting problem a fruitful field for heuristic,
a set of larger objects. Features such as pieces and object shapes (either e.g., bottom-left (Burke et al., 2006) and meta-heuristics approaches
regular or irregular), the allowed rotations, and the number of pieces such as cuckoo-search (Elkeran, 2013), Biased Random-Key Genetic
define different packing problem categories, all of which are NP-hard Algorithm (BRKGA) (Mundim et al., 2017), and genetic algorithm (Li
problems (Fowler et al., 1981). A detailed classification of the cutting et al., 2019). Machine learning also has been used as an auxiliary
and packing problems is proposed in Wäscher et al. (2007). procedure in the different steps of the cutting pattern generation. Rako-
Among the cutting and packing problems the irregular packing tonirainy (2020) used machine learning to define the best algorithm
problems, also known as nesting problems, combine the combinatorial to solve the regular strip packing problem. Gahm et al. (2022) applied
nature of the cutting problems with the challenge of the geomet- machine learning to predict the feasibility of bin packing instances.
ric representation of non-convex pieces. Irregular packing problems Furthermore, reinforcement learning techniques have been applied to
have applications in garment, automotive, leather, and glass cutting
the transport-and-pack problem of 3D regular objects (Hu et al., 2020)
industries, where the characteristics of the problem are related to the
and the online 3D bin packing problem (Zhao et al., 2021). Both studies
application. These characteristics include the irregular shape of pieces
consider only regular pieces allowing a more straightforward feasibility
and objects (convex and/or non-convex), rotation of pieces (free or
verification than the required by cutting problems with irregular pieces
discrete), and objects with defective areas. Defective areas are fre-
that is the subject of this research.
quent when considering natural materials such as leather, granite, and
This research approaches the nesting problem in a rectangular
wood (Baldacci et al., 2014).
object with fixed height and unbounded width with prohibitive zones
Different solution approaches have been applied to the nesting prob-
lem – a summary of the main concepts and strategies involved in the in the object inspired by problems found in textile industries. The
nesting problem is presented in Bennell and Oliveira (2009) and Leao prohibitive areas considered here are unusable zones (defective areas)
et al. (2020). Exact methods using mathematical programming explored in the object that have to be excluded from the solution process. In the

∗ Corresponding author.
E-mail addresses: petra.bartmeyer@icmc.usp.br (P.M. Bartmeyer), larissa.tebaldi@alumni.usp.br (L.T. Oliveira), aasleao@uel.br (A.A.S. Leão),
fran@icmc.usp.br (F.M.B. Toledo).

https://doi.org/10.1016/j.eswa.2022.118207
Received 13 September 2021; Received in revised form 30 June 2022; Accepted 15 July 2022
Available online 25 July 2022
0957-4174/© 2022 Elsevier Ltd. All rights reserved.
P.M. Bartmeyer et al. Expert Systems With Applications 209 (2022) 118207

context of rectangular pieces cutting problem, Martin et al. (2021) pro-


posed a constrained programming-based algorithm to provide feasible
cutting plans for instances with defective rectangular areas. Inspired by
the defects in leather garment and furniture industries, Baldacci et al.
(2014) proposed heuristics approaches based on Lagrangian relaxation
and local search heuristics to provide feasible cutting patterns for
nesting problems. There are two main options to represent prohibitive
zones, the first option is adapting the object to remove the affected
areas, which means creating holes in the object representation (Babu &
Babu, 2001; Heistermann & Lengauer, 1995). This strategy is suitable
when considering a raster representation and it is frequent in leather
industry applications. Alternatively, it is possible to keep the original Fig. 1. A layout for the strip packing problem.
object by representing the defects as artificial pieces superimposing
those zones (Jones, 2014). The second option, it is applied during this
research.
Considering that machine learning methods have been successfully
applied in various optimisation contexts (Bengio et al., 2021; Gahm
et al., 2022; Rakotonirainy, 2020), this paper draws on reinforcement
learning techniques to design an expert system to quickly adapt the
initial cutting plan when defects on the object emerge during the
cutting process. The developed system applies reinforcement learning
embedded in a greedy heuristic to generate feasible cutting plans.
This research also study the benefits and limitations of discrete and
continuous representations of the object for the learning process.
In summary, the contributions of this paper are threefold: (i) de-
velopment of the first reinforcement learning approach dedicated to Fig. 2. Defective area in the object (red area).
generate allocations sequences for the nesting problem; (ii) integration
of placement heuristics and reinforcement learning to generate good
layouts; and (iii) the development of an expert system that allows to
overcome defective areas in nesting problems within seconds.
Computational experiments confirm reinforcement learning as a
valid strategy to provide suitable allocation sequences for the nesting
problem. Comparisons with the best results in the literature attest to the
benefits of integrating reinforcement learning and placement heuristics
to deliver layouts for the nesting problem. The expert system’s ability
to overcome defects is confirmed by experiments considering defective
areas of diverse sizes and shapes.
The paper is organised as follows. Section 2 contains the main
aspects of the problem studied in this paper. Section 3 details the pro-
Fig. 3. Alternative layout avoiding the defective area.
posed expert system. Computational studies are presented in Section 4
followed by the conclusion in Section 5.

2. Problem statement Fig. 1 illustrates a layout for the strip packing problem containing
eight pieces. Fig. 2 represents the case where the cutting machine
This study is dedicated to the nesting problem that involves defining cannot execute the original cutting plan because it was detected a de-
a cutting plan for a given number of pieces placed in a larger object fective area in the object (cross-hatched area). Therefore, as illustrated
of fixed height and infinity width. Moreover, at least one piece has in Fig. 3, an alternative cutting plan is elaborated.
irregular shape that can be convex or non-convex. The problem is well-
known in the literature as irregular strip packing problem. A feasible 3. Expert system designed for dealing with defective areas
layout to this problem is a layout with no overlapping pieces and the
objective is to minimise the used width. Fig. 1 illustrates a feasible This research proposes an expert system to quickly react to defects
layout for the strip packing problem containing eight pieces. in the object, i.e., provide an alternative and feasible cutting plan.
The production process for the cutting pieces studied in this paper The system is a collaborative method based on reinforcement learning
comprises two steps. In the first one, a computational tool produces a guided by a heuristic method. As illustrated in Fig. 4, the system is
cutting plan (layout of pieces) for the larger object. In the second step, composed of a training phase and a fine-tuning phase. The training
the cutting plan is executed by a production cutting machine. It is not phase takes place when the cutting plan is being defined, even though
uncommon that a defective area is detected in the raw material at the no information about defective areas is available. On the other hand,
start of the second step, requiring a quick action (usually empirical) to the fine-tuning phase only occurs if a defective area is detected. In that
update the original cutting plan, avoiding the defective area (Chrys- case, the fine-tuning adapts the previous learning model to a specific
solouris et al., 2000). Fig. 2 represents the case where the cutting defect, providing an alternative layout.
machine cannot execute the original cutting plan because a defective Both phases comprise the same three main steps. In the first step,
area was detected in the object (cross-hatched area). Therefore, as an allocation sequence to pieces is obtained based on a given learning
illustrated in Fig. 3, an alternative cutting plan is elaborated. In that matrix. Next, following this sequence, a bottom-left heuristic provides
regard, this research proposes an expert system designed to deliver the layout for the nesting problem. In the third step, the quality of
a feasible cutting plan avoiding the prohibited zones within a short the layout is evaluated and used to update the learning matrix in the
response time. last step. These three steps are repeated until a stop criteria is met.

2
P.M. Bartmeyer et al. Expert Systems With Applications 209 (2022) 118207

Fig. 4. Expert system design.

The difference between training and fine-tuning phases occurs on the NFPs use the reference point to define the forbidden allocation
second step, where the bottom-left heuristic may consider defective regions between each pair of pieces, i.e., if the reference point is inside
areas or not. the forbidden region, the pieces are overlapping. For the continuous
The steps of the expert system are detailed in the following sub- representation, the NFP is given by vertices of the polygon defined by
sections. Section 3.1 presents the bottom-left heuristic used here. Sec- the forbidden region, as illustrated by the red polygon in Fig. 5(a). For
tion 3.2 explains how to obtain a solution using reinforcement learning. discrete representation, also known as raster NFP, the NFP is composed
The training phase is presented in Section 3.5. The fine-tuning phase is of a list of the forbidden points in the object, as illustrated by the set
explained in Section 3.6. of red dots in Fig. 5(b). A further verification ensures that the piece is
entirely inside of the object, defining a feasibility polygon named the
3.1. Bottom-left heuristic inner-fit polygon (IFP). If the reference point for the piece is within the
IFP, the piece is inside the object. The IFPs for continuous and discrete
This section presents the heuristic used to obtain a 2D layout to representations are illustrate by the blue polygon in Fig. 5(a) and blue
the nesting problem. In the cutting problem context, a constructive dots in Fig. 5(b), respectively.
heuristic frequently applied is the bottom-left (BL) heuristic (Baker This research considers the impact of continuous and discrete repre-
et al., 1980). The BL follows a placement rule where each piece will sentations and the ordered vector of pieces in the learning process. The
be allocated in a feasible position on the leftmost position of the object ordered vector is given by a reinforcement learning strategy, detailed
using the lowest coordinate possible. Note that the solution provided below.
by the BL heuristic depends on the pieces allocation sequence.
The literature shows that the order in which the pieces are placed 3.1.2. BL considering defective areas
by the BL heuristic influences the quality of the resulting nesting A defective area can have a different size and shape and be pre-
plan (Babu & Babu, 2001; Bennell & Oliveira, 2009; Dowsland et al., sented in any object position. Before starting the BL heuristic, an extra
2002; Mundim et al., 2018; Pinheiro et al., 2016), making the sequence piece (or a set of pieces if necessary) is assigned to represent the
of pieces (ordered vector of pieces) a parameter to be optimised. In that defective area. The extra piece has a fixed position and rotation. Then,
regard, Babu and Babu (2001) applied a genetic algorithm to define the the BL heuristic is run using the allocation sequence.
best allocation sequence, while Mundim et al. (2017) and Pinheiro et al. The representation of the defective areas as an extra piece allows to
(2016) used BRKGA. use the concepts of NFP to assure the solution feasibility, i.e., if there
Besides the order in which the pieces are positioned, another critical is no overlapping between pieces, the defective area was excluded of
issue in developing a BL heuristic is to define the search space represen- the cutting plan.
tation, i.e., how the pieces and the object are represented geometrically.
The main techniques to represent them are called continuous and 3.2. Generating a piece allocation sequence
discrete representations. The impact of both representations on the
computational burden and solution quality are discussed below. As explained in the previous section, the method used to obtain
a layout of each solution is a BL heuristic. The BL heuristic depends
3.1.1. Search space representation on an ordered sequence vector to define the allocation of the pieces.
In the continuous representation, pieces can be placed in any object In this research, the ordered sequence vector is generated based on a
position, as long as they do not overlap and are entirely inside the learning matrix 𝑄 using a constructive process described in Algorithm
object. In the discrete representation, the object is described as a grid, 1. The procedure to obtain matrix 𝑄 used on Algorithm 1 is described
and the pieces are placed at one of the grid points. Consequently, the in Algorithm 2.
solution quality depends on the granularity of the discretisation — the The input of Algorithm 1 is the learning matrix (𝑄), the number of
more points on the representation, the better the solution quality, the piece types (𝑛) and their demands (𝑑), and a parameter 𝑒𝑝. Parameter
higher the computational burden. In both cases, a piece is represented 𝑒𝑝 defines a balance between exploration and exploitation (line 7, Algo-
by a single point called the reference point and placing a piece is rithm 2). If a random value (𝑟𝑛(0, 1)) is smaller than 𝑒𝑝, the next piece
equivalent to determining the coordinates of its reference point. is chosen from a uniform distribution; otherwise, the piece with the
The feasibility verification is the most onerous step in the nesting largest 𝑄𝑖𝑗 is selected. In both cases, the decision process only considers
problem. Some of the tools used to ensure feasibility are phi-functions, the piece types for which the demand is not fulfilled (set 𝑓 ). The
raster representation, and no-fit polygons (NFP) (Bennell & Oliveira, vector positions are filled in increasing order. Two random numbers are
2008). The last option is applied in this research. used in the Algorithm 1, first 𝑟𝑛(0, 1) selects a random number from a

3
P.M. Bartmeyer et al. Expert Systems With Applications 209 (2022) 118207

Fig. 5. NFP representations.

Algorithm 1 Generate a piece allocation sequence penalised. Previous decisions are used to guide future decisions. There
are different reinforcement learning methods (Sutton & Barto, 2018)
1: Input: 𝑄, 𝑛, 𝑑, 𝑒𝑝
of which the Q-learning method is applied in this research. In the Q-
2: Output: 𝑆 ⊳ Solution vector defining the allocation sequence
learning method, the Bellman equation (Bertsekas, 2012) is applied to
3: 𝑓 ← {1, … 𝑛} ⊳ Initialise the pieces feasible set
4: for each position (𝑖) of the solution vector 𝑆 update the learning matrix 𝑄 considering the quality of a state/action
5: if 𝑟𝑛(0, 1) ≤ 𝑒𝑝 then pair. The Q-learning method is iterative and has guaranteed conver-
6: 𝑗 ← 𝑟𝑛_𝑖𝑛𝑡(𝑓 ) ⊳ Roulette selection gence to the optimal learning matrix (𝑄∗ ) after a large number of
7: else iterations (Watkins, 1989).
8: 𝑗 ← max 𝑄𝑖𝑗 The learning matrix 𝑄 starts as an empty matrix of dimension 𝑚 × 𝑛.
𝑗 The value 𝑚 is the number of available states defined as the total
9: end if ∑
number of pieces to be allocated in the sequence (𝑚 = 𝑛𝑗=1 𝑑𝑗 ). The
10: 𝑆𝑖 ← 𝑗
value 𝑛 corresponds to the available actions defined by the total types
11: 𝑑𝑗 ← 𝑑𝑗 − 1 ⊳ Update the demand of the selected piece
of pieces in the instance. The matrix entry 𝑄𝑖𝑗 represents the benefits of
12: if 𝑑𝑗 = 0 then
allocating a piece of type 𝑗 in the 𝑖th position of the solution sequence.
13: 𝑓 ←𝑓 ⧵𝑗 ⊳ Remote 𝑖 from the set of feasible pieces
Algorithm 2 describes the training process applied to obtain 𝑄∗ . The
14: end if
input data are the number of pieces types (𝑛), the demand for each
15: end for
item type (𝑑), parameters to reinforcement learning method (𝑒𝑝, 𝛼, 𝛽, 𝛾
and 𝛿𝑒𝑝 ), and an initial matrix 𝑄. The values for the parameter 𝛿𝑒𝑝 , the
reward 𝛼, the penalisation 𝛽, as well as the learning rate 𝛾 are presented
uniform distribution between 0 and 1. Second, the 𝑟𝑛_𝑖𝑛𝑡(𝑓 ) selects an in Section 4.
integer number in the set of pieces 𝑓 . After completing the position
𝑖, the demand vector is updated, and a new piece is chosen for the Algorithm 2 Learning algorithm
following position, and so forth.
1: Input: 𝑛, 𝑑, 𝑒𝑝, 𝛼, 𝛽, 𝛾, 𝛿𝑒𝑝 , 𝑄
2: Output: 𝑄
3.3. Initialise learning matrix
3: while (stop criterion is not met) do
4: 𝑆 ← Generate a piece allocation sequence ⊳ Based on matrix 𝑄
A class of machine learning techniques is defined by reinforce-
5: 𝐶𝑊 ← 2D nesting solution ⊳ Solution value using BL heuristic
ment learning, which are strategies designed to make a sequence
6: Update the 𝑄 matrix ⊳ Process described on Algorithm 3
of decisions. Reinforcement learning methods are based on trial and
7: 𝑒𝑝 ← 𝛿𝑒𝑝 × 𝑒𝑝
error where, after each decision, a function evaluates the quality of
8: end while
the decision — good decisions are rewarded, and bad decisions are

4
P.M. Bartmeyer et al. Expert Systems With Applications 209 (2022) 118207

Each iteration of the learning process generates a new allocation Table 1


Instances description.
sequence (line 4), and the matching layout is generated by the BL
Instance PT d 𝑚
heuristic (line 5). The layout width (CW — current width) allows to
update the learning matrix (line 6), which is evaluated in Algorithm 3. RCO𝑛 7 (𝑛, 𝑛, 𝑛, 𝑛, 𝑛, 𝑛) 7 × 𝑛
BLAZEWICZ𝑛 7 (𝑛, 𝑛, 𝑛, 𝑛, 𝑛, 𝑛) 7 × 𝑛
The exploration parameter is updated at the end of each iteration (line
SHAPES2 4 (2,2,2,2) 8
7). In summary, the learning process is guided by the BL heuristic, and SHAPES4 4 (4,4,4,4) 16
the trained matrix guides the optimisation (BL heuristic). SHAPES5 4 (5,5,5,5) 20
SHAPES7 4 (7,7,7,7) 28
SHAPES9 4 (9,7,9,9) 36
3.4. Reward policy SHAPES15 4 (15,7,9,12) 43

The width of the layouts generated at each iteration of Algorithm


2 are used to update 𝑄. The width of the best incumbent layout
(𝐵𝑊 ) is used to evaluate if the current layout width (𝐶𝑊 ) is good overcome defective objects in the nesting problem. To this end, they
or bad. Solutions whose layout width is smaller or equal the best are divided into two stages. The first one is to validate the training
incumbent solution are rewarded, otherwise, they are penalised. The of the reinforcement learning strategy. In this stage, it is studied the
matrix 𝑄 is updated using Algorithm 3. Note that the solution vector computational burden implications of continuous and discrete repre-
is just evaluated at the end of the constructive process when the policy sentations for the nesting problem within the training phase (described
(allocation sequence) is already defined. Lines 5 and 9 on Algorithm 3 by Algorithm 2). It also evaluates the performance of the learning
present Bellman’s equations for reward and penalisation, respectively. strategies in achieving the best results in the literature. The second
Parameters 𝛼, 𝛽, and 𝛾 are defined in Section 4. stage explores the performance of the proposed expert system (Fig. 4)
in handling objects with defective areas.

Algorithm 3 Update learning matrix 4.1. Dataset and setup


1: Input: 𝑄, 𝐵𝑊 , 𝐶𝑊 , 𝛼, 𝛽, 𝛾
2: Output: 𝑄 The computational tests were performed in a desktop Intel® Core™
3: if (𝐵𝑊 − 𝐶𝑊 ) ≥ 0 then ⊳ Reward - CW better or equal to BW i7-10700F CPU @ 2.90 GHz ×16, 64bits with 16 GB using Ubuntu
4: for each piece 𝑖 in the allocation sequence 20.04. The solution methods were coded in Matlab 2020b and C++.
5: 𝑄𝑖𝑗 ← (1 − 𝛾)𝑄𝑖𝑗 + 𝛾𝛼 The reward weights 𝛼 and 𝛽 were explored in the interval [0, 10] and
6: end for the learning rate 𝛾 considered the interval [0, 0.5]. After computational
7: else ⊳ Penalisation — CW worst than best solution (BW) experiments using the irace package the parameters were set as 𝛼 = 1,
8: for each piece in the solution 𝛽 = 1, 𝛾 = 0.01, and 𝛿𝑒𝑝 = 0.005 for Algorithm 2 (López-Ibáñez et al.,
9: 𝑄𝑖𝑗 ← (1 − 𝛾)𝑄𝑖𝑗 − 𝛾𝛽 2016). The initial value of parameter 𝑒𝑝 was set to one.
10: end for Table 1 describes the set of 16 instances considered in the compu-
11: end if tational experiments. The first column contains the instance name, and
the second column presents the total types of pieces (‘‘PT’’). Demand
3.5. Training phase for each type of piece is given by the third column (‘‘d’’), in which
each sequence represents, from left to right, the demand of the first
to the last type of piece. For example, SHAPES15 is composed of 15
For the Training phase, Algorithm 2 consider as an input an empty
pieces of the first type, 7 of the second, 9 of the third, and 12 of the
matrix (𝑄) whose entries will be updated after each iteration, returning
fourth piece type. For RCO and BLAZEWICZ instances, the notation n
a trained matrix 𝑄 at the end of the training process.
relates the instance name to the piece demand. The last column (‘‘m’’)
reports the total of pieces for each instance. Fig. 6 illustrates the shape
3.6. Fine-tuning phase of the pieces for each family of instance. For more details about these
instances, see ESICUP (2021) and Toledo et al. (2013).
The most time-consuming step of a machine learning process is
the training phase. Therefore, handling defective areas by training a 4.2. Ablation studies
learning matrix 𝑄 from scratch is not a feasible option due to the
short response time available. In that regard, transfer learning strategies The following studies analyse the impact of the search space repre-
can help by allowing to start the learning process from an advanced sentation and pieces shapes in the proposed expert system.
point, reducing the amount of time spent on training. In the expert
system proposed here, it means that an offline phase will delivers a pre-
4.2.1. Analysis of continuous and discrete representations
trained matrix 𝑄 and an online phase (fine-tuning phase) will adjusted
As discussed in the literature, a continuous representation allows the
the matrix 𝑄 to the defect. The states and actions are kept the same
BL achieve better layouts due to the exact representation of the object
for instances with and without defective areas, making the fine-tuning and the position between pieces. However, the computational burden
phases only accountable for adapting the matrix 𝑄 to the changes entailed by the analysis of non-convex pieces precludes its use to
(defective areas) in the environment (Plisnier et al., 2019). deal with larger instances. For the discrete representation, the solution
For the fine-tuning phase, the learning process described in Algo- quality is related to the number of points in the grid, i.e., smaller-sized
rithm 2 starts with the 𝑄 matrix delivered by the training phase. In grids tend to result in better solutions but entail higher computational
addition, the BL heuristic applied on Line 5 now considers the defective time. Therefore, a study of the suitable grid size for the discrete
areas when generating the layouts. representation is presented before comparing the two representations.
Another critical question is the shape of pieces, i.e., convex or
4. Computational experiments no-convex. The non-convexity of pieces can negatively influence in
execution time of the training strategy since non-convex pieces do not
The computational experiments focus on evaluating the benefits of allow straightforward overlapping verification.
the proposed expert system in providing good alternative solutions to Influence of the grid size

5
P.M. Bartmeyer et al. Expert Systems With Applications 209 (2022) 118207

Table 2
Results for the training phase considering 110% of the best solution in the literature.
Column ‘‘𝑚’’ reports the total number of pieces. The smallest and medium solution
width are reported in columns ‘‘Best’’ and ‘‘Med’’. The execution time and total number
of iterations are presented in columns ‘‘Sec’’ and ‘‘Iter’’.
Instance 𝑚 Continuous Discrete
Best Med Sec Iter Best Med Sec Iter
RCO1 7 8 8 7 500 8 8 41 500
RCO2 14 16 16 17 500 16 16 71 500
RCO3 21 23 24.2 38 500 24 24 147 500
RCO4 28 30.5 30.5 102 500 31 31 504 1011
RCO5 35 39 39.6 220 500 39 39 3463 4576
BLAZEWICZ1 7 8 8 10 500 8 8 47 500
BLAZEWICZ2 14 15 15 148 500 16 16 79 500
BLAZEWICZ3 21 22 22 429 500 23 23 207 500
BLAZEWICZ4 28 29.1 29.1 1427 500 29 30 266 500
BLAZEWICZ5 35 36.9 37 2640 500 36 36 278 500
SHAPES2 8 14 14 108 500 16 16 65 500
SHAPES4 16 27 28 3600 780 28 28 151 500
SHAPES5 20 32.5 35 3600 284 35 35 1627 500
SHAPES7 28 48 49.5 3600 77 48 48 2352 500
SHAPES9 36 53 55 3600 24 52 54 3600 754
SHAPES15 43 66.5 67 3600 27 67 68 3600 374
Fig. 6. Types of pieces in each family of instances.

effort required by instances with and without non-convex pieces is


The performance of the training phase considering grids with differ-
similar. Once defined the pros and cons of the continuous and discrete
ent scales is analysed. The analysis compares 0.5, 1, and 2 square unit
representations, the following section evaluates the performance of
grids. This experiment uses the RCO instances, i.e., instances composed
both of them in converging to 𝑄∗ matrix.
of only convex pieces. The stop criteria for the training phase comprise
the maximum execution time of 3600 s or the last 50 iterations present
4.2.2. Convergence of the training-phase
median under 110% of the best result in the literature (Elkeran, 2013;
This section presents the performance of Algorithm 2 during the
Sato et al., 2019). The second criterion requires a minimum of 500
training process. The stop criteria are composed of a maximum exe-
iterations.
cution time of 3600 s, a maximum of 100 iterations without change in
The results in Fig. 7(a) presents the median of the solution widths in
the current solution, and a comparison with the best results reported in
the last 50 iterations when using each one of the discretisation scales.
the literature (Sato et al., 2019). For the comparison with the literature,
Fig. 7(b) shows the number of iterations until training phase com-
the stop criteria is met if the last 50 iterations have a median smaller
pletion. The discretisation of 1 square unit had the best performance
than 110% of the best result. Excepting the maximum execution time,
among them. This behaviour is explained by combining the number
the others stop criteria require a minimum of 500 iterations.
of executions and the quality of the object representation. While the Table 2 presents the results for the continuous and discrete repre-
discretisation of 0.5 square unit has a higher number of points, repre- sentations. The results reported in this section consider a single training
senting the object better, it also entails a high computational burden, cycle for each instance. Note that the discrete representation returns
reducing the number of evaluations within the maximum execution integer solutions. For the columns ‘‘Sec’’, execution time smaller than
time. On the other hand, the discretisation of 2 square units has a small 3600 represents instances that converged for 𝑄∗ .
number of points to evaluate, allowing a larger number of iteration Table 3 reports the best solution value known in the literature
within the maximum execution time. However, the small number of in column (‘‘Lit’’.) (Elkeran, 2013; Sato et al., 2019). The solutions
points representing the object resulted in poor-quality solutions. Sum- presented in column ‘‘RefC’’ are the best feasible solution achieved
mary, grid size 1 is used in the following computational tests based on by the NFP-CM model (Cherri et al., 2016) considering the maximum
these results. execution time of 3600 s. Column ‘‘RefR’’ presents the best feasible
Influence of non-convex pieces on execution time The following com- solution achieved by the Dotted-board model (Toledo et al., 2013)
putational tests evaluate the computational burden entailed by non- with the same maximum execution time (3600 s). Columns ‘‘RefC’’
convex pieces on the execution of the BL heuristic. Here, RCO and and ‘‘RefR’’ illustrate the loss in the solution quality as the 𝑚 values
BLAZEWICZ instances were used. BLAZEWICZ instances have 37 of increase. In both cases, the notation (∗ ) highlights the instances with
non-convex pieces, while RCO instances, as already mentioned, are optimal solutions, and the notation (-) marks instances that did not
composed only by convex pieces. It is noteworthy that RCO is defined achieve a feasible solution within 3600 s.
by the convex-hull of the pieces in BLAZEWICZ. The NFP-CM, named RefC, just achieved the optimal solution for
Since the object representation is one of the critical factors for the 2 instances, and it failed to deliver feasible solutions for 3 instances.
BL heuristic performance, the computational tests are carried out for The Dotted-board model (RefR), considering a discrete search space,
scenarios considering both continuous and discrete representations. The achieved optimal solutions for 6 out 16 instances, failing to deliver
impact of non-convex pieces is evaluated by analysing the amount of feasible solutions in a single instance. In both cases, the solution quality
time required to execute 500 iterations in each one of the scenarios. decreases as the 𝑚 value increases — SHAPES instances exemplify this
Fig. 8 presents the relation between the total number of pieces in effect. The computational burden entailed by the exact approaches RefC
the instances (𝑚) and the execution time required for instances with and RefR precludes their use to handle nesting problems with defec-
only convex pieces (RCO) and with non-convex pieces (BLAZEWICZ). tive areas where fast response is required. For instances with a small
Note that, for the continuous representation (Fig. 8(a)), the instances number of pieces (RCO1, BLAZEWICZ1, and SHAPES2), the training
with non-convex pieces have a larger processing time than instances phase using continuous representation converged to the best solution
composed only by convex pieces continuous representation. When of the literature. The training phase using the discrete representation
considering the discrete representation (Fig. 8(b)), the computational presented the same behaviour to the instances RCO1 and BLAZEWICZ1.

6
P.M. Bartmeyer et al. Expert Systems With Applications 209 (2022) 118207

Fig. 7. Comparison of the grid scale for the solution quality (left) and number of iterations (right) for RCO instances.

Table 3
Results for the training phase considering 110% of the best solution in the literature.
Columns ‘‘𝑚’’ and ‘‘Lit’’ report the total number of pieces and the best value on the
literature. Columns ‘‘RefC’’ and ‘‘RefR’’ present the best feasible solution for approaches
(Cherri et al., 2016; Toledo et al., 2013), respectively. The smallest solution width are
reported in columns ‘‘Best’’.
Instance 𝑚 Lit. Continuous Discrete
RefC Best RefR Best
RCO1 7 8.0 8.0∗ 8.0 8∗ 8
RCO2 14 15.0 15.0 16.0 15∗ 16
RCO3 21 22.0 22.2 23.0 22∗ 24
RCO4 28 29.0 31.0 30.5 29 31
RCO5 35 36.3 39.0 39.0 36 39
BLAZEWICZ1 7 8.0 7.0 8.0 8∗ 8
BLAZEWICZ2 14 14.0 15.8 15.0 14∗ 16
BLAZEWICZ3 21 20.2 21.3 22.0 21 23
BLAZEWICZ4 28 27.1 28.8 29.1 28 29
BLAZEWICZ5 35 34.0 – 36.9 35 36
SHAPES2 8 14.0 14.0∗ 14.0 14∗ 16
SHAPES4 16 25.0 31.0 27.0 – 28
SHAPES5 20 29.0 39.0 32.5 50 35
SHAPES7 28 40.0 – 48.0 47 48
SHAPES9 36 46.4 – 53.0 59 52
SHAPES15 43 58.6 81.0 66.5 77 67

When considering convergence aspect, i.e., achieving a stop criterion


besides the maximum execution time, the discrete representation pre-
sented the best results converging in 14 out 16 instances. In contrast,
the continuous representation converged in 11 out of 16 instances.
Continuous and discrete representations presented similar perfor-
mance when comparing the average layout width. The continuous
representation is better in 11 out of 16 instances, while the discrete
representation is better in 10 of them.
The learning process considering the continuous representation pre-
sented convergence for all RCO and BLAZEWICZ instances. However,
it failed to handle SHAPES instances, excepting SHAPES2, due to the
small number of iterations performed within the stop criteria. This
behaviour is explained by the large number of non-convexities on
SHAPES instances. The continuous representation fails to achieve the
minimum of iterations for 5 out of the 16 instances tested.
Using the discrete representation, the total number of iterations
increases to compensate for the discrete representation of the object.
The faster evaluation allows performing more iterations, even creating
better results than the ones achieved by the continuous representation
Fig. 8. Effect of non-convex pieces on the computational time required by the BL
in some cases. Only the instance SHAPES15 did not achieve the mini-
heuristic to execute 500 iterations. mum of 500 iterations. For instances BLAZEWICZ2, BLAZEWICZ3, and
BLAZEWICZ4, the larger number of iterations required in the training

7
P.M. Bartmeyer et al. Expert Systems With Applications 209 (2022) 118207

phase when considering discrete representation is the result of a not Table 4


suitable discretisation of the object. This conclusion is sustained by Comparison between 𝑈 𝐴 and 𝑈 𝐴𝑑 when considering single defective areas represented
by the smallest type of piece.
the result of instance BLAZEWICZ5, which achieves convergence within
Instance Defect Continuous Discrete
300 s even with a larger number of pieces.
Piece type 𝑈𝐴 𝑈 𝐴𝑑 Iter 𝑈 𝐴𝑑 Iter

4.3. Performance of the expert system RCO1 7 78.75 62.23 500 72.13 500
RCO2 7 78.75 80.00 500 75.29 500
RCO3 7 78.09 79.03 500 79.63 500
In the textile industry, the time-spam between the definition and RCO4 7 82.66 79.12 500 79.41 500
the execution of the cutting pattern is large. This feature allows the RCO5 7 79.58 80.61 500 81.32 411
training phase be executed offline. As soon as the cutting process starts BLAZEWICZ1 7 67.50 63.83 500 61.83 500
and a defective area is detected, an alternative cutting pattern must BLAZEWICZ2 7 72.00 74.87 500 68.64 500
be provided in a couple of minutes. At this point, a fine-tuning phase BLAZEWICZ3 7 73.63 71.48 500 68.25 500
(online) adjusts the matrix 𝑄 from the training phase to the objective BLAZEWICZ4 7 74.48 68.15 136 72.64 500
BLAZEWICZ5 7 72.97 69.01 67 75.59 500
with defective area.
In the following tests, defective areas are represented by artificial SHAPES2 1 57.14 50.14 500 56.14 500
SHAPES4 1 64.00 56.02 44 60.95 169
pieces with fixed position and fixed rotation (as illustrated in Fig. 3);
SHAPES5 1 68.96 56.73 19 56.73 100
thus, they are not part of set of pieces in the decision process. To mimic SHAPES7 1 70.00 58.45 5 59.25 63
the stochastic behaviour of the defective areas, an uniform distribution SHAPES9 1 70.43 52.94 3 58.64 60
is used to define the 2D position of the prohibited allocation area. SHAPES15 1 70.51 51.58 4 59.92 62
Computational experiments are divided into two cases. The first case
considers small defective areas where the defect is represented by the
smallest pieces in the instance. The second case varies the size of the
4.3.2. Influence of the size of defective areas
defective zone using larger artificial pieces to represent the defect.
The following study considers objects with a single defective area
Computational tests in this section consider a maximum execution
of different sizes and shapes. All piece types in each instance are used
time of 600 s and a stop criterion based on 110% of the best solution
to represent the defects. The position of the defective area within
during the fine-tuning phase. The training phase considers the same
the object is generated using a uniform distribution. Figs. 9 and 10
execution time and the stop criteria presented in the previous section.
illustrate the impact of instance width and number of pieces (m) in
The quality of the solutions is analysed by comparing the useful area
of the instance without defects (𝑈 𝐴 — Eq. (1)) and the useful area of the 𝑈 𝐴𝑑 value. The 𝑥-axis presents the type of piece representing the
the same instance with defect (𝑈 𝐴𝑑 — Eq. (2)). defective area, and the 𝑦-axis represents the 𝑈 𝐴𝑑 value for each defect.
∑ As expected, the larger the number of pieces in the instance, the
𝑖∈𝑃 𝑑𝑖 𝐴𝑃𝑖 smaller the impact of the defective zone and the better the 𝑈 𝐴𝑑 value
𝑈𝐴 = (1)
𝑂𝐴 — resulting from the small proportion of the defect area to the total

𝑖∈𝑃 𝑑𝑖 𝐴𝑃𝑖
area of the pieces in instances with large m values.
𝑈 𝐴𝑑 = ∑ (2) The discrete representation presented better 𝑈 𝐴𝑑 values when com-
𝑂𝐴 − 𝑗∈𝐷 𝐴𝐷𝑗
pared to the continuous representation. This behaviour is well illus-
where 𝑑𝑖 is the demand of pieces of type 𝑖, 𝐴𝑃𝑖 is the area of a piece trated by the instances RCO1 and BLAZEWICZ1, where the useful area
of type 𝑖, 𝐴𝐷𝑗 is the area of the defect of type 𝑗, and 𝑂𝐴 is the area (𝑈 𝐴𝑑 ) for the discrete representation is greater or equal to the results
of the object. The area of the object is calculated as 𝑂𝐴 = ℎ × 𝐵𝑊 , for the continuous representation.
where the ℎ is the object height and 𝐵𝑊 is the best width achieved
Figs. 11–13 on the Appendix illustrate the impact of defective areas
during the iterations of the fine-tuning phase. The 𝑈 𝐴 and 𝑈 𝐴𝑑 values
in the solution layout for the continuous and discrete representations.
allow measuring the impact of the defect on the solution quality. Closer
The defective area is represented by the piece in red on those fig-
values of 𝑈 𝐴 and 𝑈 𝐴𝑑 indicate defects with a small impact on the
ures. The 2D position of the defective area is generated by a uniform
production plan and an effective reaction of the expert system.
distribution thus, it is different for each instance and execution.
4.3.1. Impact of a small defective area
5. Conclusions
The first group of computational tests evaluates the impact of small
defective areas on the area of the layout. To that end, it generated single
defective areas represented by the piece type with the smallest area The cutting problems are NP-hard, making the generation of an
in each of the instances. In Table 4, column ‘‘Piece Type’’ describes initial production plan a time-consuming task. Therefore, when defec-
the type of piece used to represent the defective area in each instance. tive areas are identified during the execution of the production plan,
For instances without defects, Column ‘‘𝑈 𝐴’’ contains the percentage of the short reaction time allowed precludes the use of exact methods
the object area used to fit the best layout from the training phase. For to adapt the production plan. In this context, this paper proposed an
instances with a defect, Column ‘‘𝑈 𝐴𝑑 ’’ contains the percentage of the expert system to quickly overcome defects in the object (provide an
object area used to fit the best layout obtained by the expert system. alternative cutting plan). The system generated a placement sequence
Columns ‘‘Iter’’ presents the number of iterations until the stop criteria using Q-learning and defined the sequence layout using the bottom-left
is met. heuristic.
The discrete representation allowed the expert system to achieve The Q-learning was part of the training and fine-tuning phases of
better results when comparing with the continuous representation. This the expert system. The training phase delivers the optimal training
behaviour may be related to the larger number of iterations allowed by matrix for the nesting problem without defective areas. After that, when
the discrete representation when handling large instances. It is expected a defective area is detected, the fine-tuning phase incorporates the
that the width of the adapted layout be closer to the original layout information about the defective area in the learning matrix.
width when considering defects whose area is small compared to the When considering the instances without defects, computational ex-
object area. The cases where 𝑈 𝐴𝑑 is larger than 𝑈 𝐴 are proved of periments indicated that the Q-learning method is suitable for gener-
well-adapted layouts. In the following tests, the analysis is extended ating allocation sequences. The method provides solutions competitive
by considering a single defective zone represented by each one of the with the best results in the literature. In addition, the method achieves
types of pieces. the best solution in the literature for small instances.

8
P.M. Bartmeyer et al. Expert Systems With Applications 209 (2022) 118207

Fig. 9. Effect of different defective areas in the 𝑈 𝐴𝑑 for RCO instances.


Fig. 10. Effect of different defective areas in the 𝑈 𝐴𝑑 for BLAZEWICZ instances.

For the computational studies considering defective areas, the ex-


Oliveira: Coding, Writing – original draft, Writing – review & editing.
pert system proved to be efficient in providing good alternative layouts
Aline Aparecida Souza Leão: Writing – original draft, Writing – re-
without compromising the response time. A comprehensive study of
view & editing. Franklina Maria Bragion Toledo: Conceptualization,
the impact of the size of the defective area in the layout width was
Methodology, Writing – original draft, Writing – review & editing.
provided. As expected, the results indicate that the solution quality
is related to the size of the defect, i.e., small defective areas have Declaration of competing interest
a reduced impact on the layout compared to large defective areas.
Also, the studies indicated a worthwhile compromise on the geomet- The authors declare that they have no known competing finan-
ric representation in favour of increasing the number of Q-learning cial interests or personal relationships that could have appeared to
iterations. influence the work reported in this paper.
Future research will explore the impact of deep reinforcement learn-
ing architectures in generating allocation sequences. Previous research Data availability
proved that one could achieve better results by considering groups
of pieces with good fitting among them. In that regard, deep rein- Data will be made available on request.
forcement learning architectures can describe more complex interac-
Acknowledgements
tions between groups of pieces, which may generate better allocation
sequences. This research was funded by São Paulo Research Foundation
(FAPESP)grants #2020/15707-6, #2018/07240-0 and #2013/07375-
CRediT authorship contribution statement 0 and National Council for Scientific and Technological Development
(CNPq), Brazil grant #308761/2018-9 from Brazil. The authors are
Petra Maria Bartmeyer: Conceptualization, Methodology, Coding, grateful to anonymous reviewers whose constructive comments helped
Writing – original draft, Writing – review & editing. Larissa Tebaldi improve this paper.

9
P.M. Bartmeyer et al. Expert Systems With Applications 209 (2022) 118207

Fig. 11. Layout proposed for the instance RCO3 considering a defective area (red piece).

Fig. 12. Layout proposed for the instance BLAZEWICZ3 considering a defective area (red piece).

Fig. 13. Layout proposed for the instance SHAPES7 considering a defective area (red piece).

Table A.5

Parameters Sets
𝑄 Learning matrix 𝐷 Set of defective area
𝑄∗ Optimal learning matrix 𝑓 Piece demand
𝑒𝑝 Exploration parameter 𝑆 Allocation sequence vector
ℎ Object height
𝑛 Number of piece type Abbreviations
𝑑 Demand vector of each piece type NFP No-fit polygon
∑𝑛
𝑚 𝑚 = 𝑑𝑖 IFP Inner-fit polygon
𝑖=1
𝛼 Reward parameter BL Bottom-left algorithm
𝛽 Penalisation BW Best layout width
𝛾 Learning rate CW Current layout width
𝛿𝑒𝑝 Exploration decreasing decay Iter Iterations within the execution time
𝑂𝐴 Area of the object Sec Seconds until the stop criteria
𝐴𝑃𝑖 Area of a piece of type 𝑖 Med Median value of the last 50 iterations
𝐴𝐷𝑖 Area of a defect of type 𝑖 RefC Best solution using (Cherri et al., 2016)
𝑈𝐴 Useful area RefR Best solution using (Toledo et al., 2013)
𝑈 𝐴𝑑 Useful area without defects
𝑟𝑛(0, 1) Random number between 0 and 1
𝑟𝑛_𝑖𝑛𝑡(𝑓 ) Integer number in the set 𝑓

10
P.M. Bartmeyer et al. Expert Systems With Applications 209 (2022) 118207

Appendix Heistermann, J., & Lengauer, T. (1995). The nesting problem in the leather
manufacturing industry. Annals of Operations Research, 57(1), 147–173.
Hu, R., Xu, J., Chen, B., Gong, M., Zhang, H., & Huang, H. (2020). TAP-Net:
See Figs. 11–13 and Table A.5.
transport-and-pack using reinforcement learning. ACM Transactions on Graphics,
39(6), 1–15.
References Jones, D. R. (2014). A fully general, exact algorithm for nesting irregular shapes. Journal
of Global Optimization, 59(2–3), 367–404.
Babu, A. R., & Babu, N. R. (2001). A generic approach for nesting of 2-D parts in Leao, A. A., Toledo, F. M., Oliveira, J. F., Carravilla, M. A., & Alvarez-Valdés, R. (2020).
2-D sheets using genetic and heuristic algorithms. Computer-Aided Design, 33(12), Irregular packing problems: A review of mathematical models. European Journal of
879–891. Operational Research, 282(3), 803–822.
Baker, B. S., Coffman, E. G., & Rivest, R. L. (1980). Orthogonal packings in two Li, X., Wang, Z., Chan, F. T., & Chung, S. H. (2019). A genetic algorithm for optimizing
dimensions. SIAM Journal on Computing, 9(4), 846–855. space utilization in aircraft hangar shop. International Transactions in Operational
Baldacci, R., Boschetti, M. A., Ganovelli, M., & Maniezzo, V. (2014). Algorithms for Research, 26(5), 1655–1675.
nesting with defects. Discrete Applied Mathematics, 163, 17–33. López-Ibáñez, M., Dubois-Lacoste, J., Cáceres, L. P., Birattari, M., & Stützle, T. (2016).
Bengio, Y., Lodi, A., & Prouvost, A. (2021). Machine learning for combinatorial The irace package: Iterated racing for automatic algorithm configuration. Operations
optimization: a methodological tour d’horizon. European Journal of Operational Research Perspectives, 3, 43–58.
Research, 290(2), 405–421. Martin, M., Morabito, R., & Munari, P. (2021). Two-stage and one-group two-
Bennell, J. A., & Oliveira, J. F. (2008). The geometry of nesting problems: A tutorial. dimensional guillotine cutting problems with defects: a CP-based algorithm and
European Journal of Operational Research, 184(2), 397–415. ILP formulations. International Journal of Productions Research, 1–20.
Bennell, J. A., & Oliveira, J. F. (2009). A tutorial in irregular shape packing problems. Mundim, L. R., Andretta, M., Carravilla, M. A., & Oliveira, J. F. (2018). A gen-
Journal of the Operational Research Society, 60, S93–S105. eral heuristic for two-dimensional nesting problems with limited-size containers.
Bertsekas, D. (2012). Dynamic programming and optimal control: Vol. I. Athena scientific, International Journal of Productions Research, 56(1–2), 709–732.
vol. 1. Mundim, L. R., Andretta, M., & de Queiroz, T. A. (2017). A biased random key genetic
Burke, E., Hellier, R., Kendall, G., & Whitwell, G. (2006). A new bottom-left-fill heuristic algorithm for open dimension nesting problems using no-fit raster. Expert Systems
algorithm for the two-dimensional irregular packing problem. Operations Research, with Applications, 81, 358–371.
54(3), 587–601. http://dx.doi.org/10.1287/opre.1060.0293. Pinheiro, P. R., Júnior, B. A., & Saraiva, R. D. (2016). A random-key genetic algo-
Cherri, L. H., Cherri, A. C., & Soler, E. M. (2018). Mixed integer quadratically- rithm for solving the nesting problem. International Journal of Computer Integrated
constrained programming model to solve the irregular strip packing problem with Manufacturing, 29(11), 1159–1165.
continuous rotations. Journal of Global Optimization, 72(1), 89–107. Plisnier, H., Steckelmacher, D., Roijers, D. M., & Nowé, A. (2019). Transfer re-
Cherri, L. H., Mundim, L. R., Andretta, M., Toledo, F. M., Oliveira, J. F., & Carravilla, M. inforcement learning across environment dynamics with multiple advisors. In
A. (2016). Robust mixed-integer linear programming models for the irregular strip BNAIC/BENELEARN.
packing problem. European Journal of Operational Research, 253(3), 570–583. Rakotonirainy, R. G. (2020). A machine learning approach for automated strip packing
Chryssolouris, G., Papakostas, N., & Mourtzis, D. (2000). A decision-making approach algorithm selection.. ORiON, 36(2), 73–88.
for nesting scheduling: a textile case. International Journal of Productions Research, Sato, A. K., Martins, T. C., Gomes, A. M., & Tsuzuki, M. S. G. (2019). Raster penetration
38(17), 4555–4564. map applied to the irregular packing problem. European Journal of Operational
Dowsland, K. A., Vaid, S., & Dowsland, W. B. (2002). An algorithm for polygon Research, 279(2), 657–671.
placement using a bottom-left strategy. European Journal of Operational Research, Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: an introduction. MIT Press.
141(2), 371–381. Toledo, F. M., Carravilla, M. A., Ribeiro, C., Oliveira, J. F., & Gomes, A. M. (2013). The
Elkeran, A. (2013). A new approach for sheet nesting problem using guided cuckoo dotted-board model: A new MIP model for nesting irregular shapes. International
search and pairwise clustering. European Journal of Operational Research, 231(3), Journal of Production Economics, 145(2), 478–487.
757–769. Wäscher, G., Haußner, H., & Schumann, H. (2007). An improved typology of cutting
ESICUP (2021). Working group on cutting and packing within EURO. www.euro- and packing problems. European Journal of Operational Research, 183, 1109–1130.
online.org/websites/esicup/data-sets/#1535972088237-bbcb74e3-b507. Watkins, C. J. C. H. (1989). Learning from delayed rewards (Ph.D. thesis), Cambridge
Fowler, R. J., Paterson, M. S., & Tanimoto, S. L. (1981). Optimal packing and covering United Kingdom: King’s College.
in the plane are NP-complete. Information Processing Letters, 12(3), 133–137. Zhao, H., She, Q., Zhu, C., Yang, Y., & Xu, K. (2021). Online 3D bin packing with
Gahm, C., Uzunoglu, A., Wahl, S., Ganschinietz, C., & Tuma, A. (2022). Applying constrained deep reinforcement learning. In Proceedings of the AAAI conference on
machine learning for the anticipation of complex nesting solutions in hierarchical artificial intelligence, Vol. 35 (pp. 741–749). URL: https://ojs.aaai.org/index.php/
production planning. European Journal of Operational Research, 296(3), 819–836. AAAI/article/view/16155.

11

You might also like