Professional Documents
Culture Documents
CHAPTER 1
INTRODUCTION
A two-stage hybrid flow shop scheduling problem (THFSP) consists of two stages, at
least one of which consists of parallel machines. This NP-hard problem is a special case of the
hybrid flow shop scheduling problem (HFSP). A number of results have been obtained in
single-factory and multi-factory settings. Various methods, such as the exact, heuristic, and
metaheuristic methods, have been applied to solve THFSP in a single-factory setting. By using
an exact method and several heuristics to minimize makespan.
Production has transferred from single factories to multiple factories with the further
development of globalization. As a result, distributed scheduling problems in multiple factories
have become the main topic of production scheduling in recent years. The distributed two-stage
hybrid flow shop scheduling problem (DTHFSP) is also considered in this work.
The integration of RL and metaheuristic can lead to the dynamic selection of search
operators or adaptively adjusted parameter settings, among others. As a result, integrating RL
with metaheuristic can improve the performance of the later, making it an effective approach
to obtaining high quality solutions.
The DTHFSP with fuzzy processing time is studied, and a novel algorithm called the
QTLBO is constructed through the integration of the Q-learning algorithm and the TLBO to
minimize makespan.
CHAPTER 2
LITERATURE SURVEY
1) Title: “A Review of Reinforcement Learning Based Intelligent Optimization for
Manufacturing Scheduling”
Authors and Year: Ling Wang, Zixiao Pan, and Jingjing Wang , 2021.
Description: The paper explores the integration of machine learning techniques into
metaheuristics for solving combinatorial optimization problems. It delves into three levels of
integration: problem level integration, high-level integration between meta-heuristics, and
lowlevel integration within a meta-heuristic. The authors provide a comprehensive review of
how machine learning techniques can be used in various elements of meta-heuristics, such as
algorithm selection, fitness evaluation, initialization, evolution, parameter setting, and
cooperation. They discuss the advantages, limitations, requirements, and challenges of
implementing machine learning at each level of integration. The paper also identifies research
gaps and proposes future research directions in this domain.
3) Title: “Effective heuristics and metaheuristics to minimize total flowtime for the
distributed permutation flow shop problem”
Authors and Year: Quan-Ke Pan, Liang Gao, Ling Wang, Jing Liang, Xin-Yu Li , 2019
Description: This paper, authored by Quan-Ke Pan et al. in 2019, focuses on addressing the
distributed permutation flow shop scheduling problem (DPFSP) through the application of
heuristics and metaheuristics. The research explores the use of various algorithms, including
artificial bee colony, scatter search, iterated local search, and iterated greedy, to optimize
scheduling in flow shop environments. By proposing new heuristics and metaheuristics, the
study aims to minimize the total flowtime in manufacturing processes, ultimately enhancing
efficiency and productivity. The paper presents computational results, comparisons, and
experimental findings that demonstrate the effectiveness of the proposed approaches in
improving scheduling outcomes in dynamic manufacturing settings.
CHAPTER 3
3. 1 PROBLEM STATEMENT AND OBJECTIVES
3.1.1 Problem Statement:
3.1.2 Objectives:
Problem Specificity: Tailor the Q-table structure and reward function to effectively capture
the DTHFSP problem characteristics.
Parameter Tuning: Optimize QTLBO's parameters (learning rate, discount factor, etc.) for
optimal performance on your specific DTHFSP instances.
3. 2 METHODOLOGY
➢ Q-Learning :
• Q-learning is used to solve the distributed two-stage hybrid flow shop scheduling
problem (DTHFSP) with fuzzy processing time.
• The Q-learning algorithm is implemented using 9 states and 4 actions.
• The algorithm structure is dynamically adjusted through adaptive action selection.
➢ Reinforcement-Learning(RL) :
➢ Metaheuristic :
• Do forever:
Q(s,a) = r
s <- 𝑠1
Initialization:
1. Q-Table Creation:
o Create a data structure called a Q-table. This table stores the Q-values, which
represent the expected long-term reward of taking a specific action (a) in a
particular state (s).
o Initialize all entries in the Q-table to zero. This signifies that the agent has no
initial knowledge about the value of any state-action pair.
Learning Loop:
2. State Observation: At each time step, the agent observes the current state (s) of the
environment. This state could represent various factors depending on the problem,
such as the position of a robot in a maze or the current resources in a game.
equation. This equation combines the immediate reward (r), the estimated optimal
future reward from the new state (max_a' Q(s', a')), and a learning rate (alpha) that
controls how much the agent learns from new experiences: o Q(s, a) = (1 - alpha) *
Q(s, a) + alpha * (r + gamma * max_a' Q(s', a'))
▪ alpha (learning rate): Determines the weight given to the new experience
(r + gamma * max_a' Q(s', a')) compared to the previous Q-value. A
higher alpha leads to faster but potentially less stable learning, while a
lower alpha leads to slower but more stable learning.
▪ gamma (discount factor): Controls the importance of future rewards. A
higher gamma means the agent values future rewards more and plans for
longer-term goals.
7. Repeat:
o The agent continues by returning to step 2 and observing the new state, repeating
the learning process until it converges or reaches a stopping criterion.
1. Initialization:
The algorithm begins by setting up the initial population of solutions for the
scheduling problem. These solutions represent different ways to schedule the jobs
across the multiple factories.
2. Sorting Step:
The solutions in the population are ranked based on a specific criteria, most
likely their makespan (total completion time). The solution with the shortest makespan
is considered the best.
3. Q-Learning Step:
▪ Teacher Phase
▪ Learner Phase
▪ Teacher's Self-Learning Phase
▪ Learner's Self-Learning Phase
Based on the decision from the Q-learning step, one of the four phases is
implemented:
The algorithm checks if a stopping criteria has been met. This criteria might be
a certain number of iterations or achieving a desired makespan.
6. End:
If the stopping condition is met, the algorithm terminates and returns the best
solution found so far, which represents the optimal schedule for the DTHFSP.
7. Loop:
If the stopping condition is not met, the algorithm returns to step 3 and repeats
the process.
3.3.1 Results:
3.3.2 Discussion:
Further research is needed to explore how QTLBO scales to extremely large and
complex distributed networks. Additionally, advancements in explainable AI could improve
transparency in the system's decision-making process.
3.4.2 Disadvantages:
CHAPTER 4
In future works we will attempt to solve the distributed scheduling problem with
uncertainty by using various metaheuristics. Previous works have also mainly used many kinds
of RL algorithms, particularly Q-learning. Related to this, the integration of other RL algorithms
with metaheuristics for production scheduling
REFERENCES
[1] M. Karimi-Mamaghan, M. Mohammadi, P. Meyer, A. M. Karimi-Mamaghan, and E. -
G. Talbi, Machine learning at the service of meta-heuristics for solving combinatorial
optimization problems: A state-of-the-art, Eur. J. Oper. Res., vol. 296, no. 2, pp. 393–422, 2022
[2] J. Wang, D. M. Lei, and J. C. Cai, An adaptive artificial bee colony with reinforcement
learning for distributed three-stage assembly scheduling with maintenance, Appl. Soft.
Comput., vol. 117, p. 108371, 2021.
[3] Wang, Z. X. Pan, and J. J. Wang, A review of reinforcement learning based intelligent
optimization for manufacturing scheduling, Complex. Syst. Mod. Sim., vol. 1, no. 4, pp. 257–
270, 2021.
[4] J. Q. Li, J. K. Li, L. J. Zhang, H. Y. Sang, Y. Y. Han, and Q. D. Chen, Solving type-2
fuzzy distributed hybrid flowshop scheduling using an improved brain storm optimization
algorithm, Int. J. Fuzzy Syst., vol. 23, pp. 1194–1212, 2021.
[5] Z. S. Shao, W. S. Shao, and D. C. Pi, Effective heuristics and metaheuristics for the
distributed fuzzy blocking flowshop scheduling problem, Swarm and EComput., vol. 59, p.
100747, 2020.
[6] J. Wang, X. D. Wang, F. Chu, and J. B. Yu, An energyefficient two-stage hybrid flow
shop scheduling problem in a glass production, Int. J. Prod. Res., vol. 58, no. 8, pp. 2283–2314,
2020.
[8] B. Fan, W. Yang, and Z. Zhang, Solving the two-stage hybrid flow shop scheduling
problem based on mutant firefly algorithm, J. Amb. Intel. Hum. Comp., vol. 10, no. 3, pp. 979–
990, 2019.