Professional Documents
Culture Documents
CHAPTER 1
INTRODUCTION
A two-stage hybrid flow shop scheduling problem (THFSP) consists of two stages, at
least one of which consists of parallel machines. This NP-hard problem is a special case of the
hybrid flow shop scheduling problem (HFSP). A number of results have been obtained in
single-factory and multi-factory settings. Various methods, such as the exact, heuristic, and
metaheuristic methods, have been applied to solve THFSP in a single-factory setting. By
using an exact method and several heuristics to minimize makespan.
Production has transferred from single factories to multiple factories with the further
development of globalization. As a result, distributed scheduling problems in multiple
factories have become the main topic of production scheduling in recent years. The
distributed two-stage hybrid flow shop scheduling problem (DTHFSP) is also considered in
this work.
The integration of RL and metaheuristic can lead to the dynamic selection of search
operators or adaptively adjusted parameter settings, among others. As a result, integrating RL
with metaheuristic can improve the performance of the later, making it an effective approach
to obtaining high quality solutions.
The DTHFSP with fuzzy processing time is studied, and a novel algorithm called the
QTLBO is constructed through the integration of the Q-learning algorithm and the TLBO to
minimize makespan.
CHAPTER 2
LITERATURE SURVEY
1) Title: “A Review of Reinforcement Learning Based Intelligent Optimization for
Manufacturing Scheduling”
Authors and Year: Ling Wang, Zixiao Pan, and Jingjing Wang , 2021.
Description: The paper explores the integration of machine learning techniques into
metaheuristics for solving combinatorial optimization problems. It delves into three levels of
integration: problem level integration, high-level integration between meta-heuristics, and
lowlevel integration within a meta-heuristic. The authors provide a comprehensive review of
how machine learning techniques can be used in various elements of meta-heuristics, such as
algorithm selection, fitness evaluation, initialization, evolution, parameter setting, and
cooperation. They discuss the advantages, limitations, requirements, and challenges of
implementing machine learning at each level of integration. The paper also identifies research
gaps and proposes future research directions in this domain.
3) Title: “Effective heuristics and metaheuristics to minimize total flowtime for the
distributed permutation flow shop problem”
Authors and Year: Quan-Ke Pan, Liang Gao, Ling Wang, Jing Liang, Xin-Yu Li , 2019
Description: This paper, authored by Quan-Ke Pan et al. in 2019, focuses on addressing the
distributed permutation flow shop scheduling problem (DPFSP) through the application of
heuristics and metaheuristics. The research explores the use of various algorithms, including
artificial bee colony, scatter search, iterated local search, and iterated greedy, to optimize
scheduling in flow shop environments. By proposing new heuristics and metaheuristics, the
study aims to minimize the total flowtime in manufacturing processes, ultimately enhancing
efficiency and productivity. The paper presents computational results, comparisons, and
experimental findings that demonstrate the effectiveness of the proposed approaches in
improving scheduling outcomes in dynamic manufacturing settings.
CHAPTER 3
3. 1 PROBLEM STATEMENT AND OBJECTIVES
3.1.1 Problem Statement:
3.1.2 Objectives:
Problem Specificity: Tailor the Q-table structure and reward function to effectively capture
the DTHFSP problem characteristics.
Parameter Tuning: Optimize QTLBO's parameters (learning rate, discount factor, etc.) for
optimal performance on your specific DTHFSP instances.
3. 2 METHODOLOGY
Q-Learning :
• Q-learning is used to solve the distributed two-stage hybrid flow shop scheduling
problem (DTHFSP) with fuzzy processing time.
• The Q-learning algorithm is implemented using 9 states and 4 actions.
• The algorithm structure is dynamically adjusted through adaptive action selection.
Reinforcement-Learning(RL) :
Metaheuristic :
• Do forever:
Q(s,a) = r
s <- 𝑠1
Initialization:
1. Q-Table Creation:
o Create a data structure called a Q-table. This table stores the Q-values, which
represent the expected long-term reward of taking a specific action (a) in a
particular state (s).
o Initialize all entries in the Q-table to zero. This signifies that the agent has no
initial knowledge about the value of any state-action pair.
Learning Loop:
2. State Observation: At each time step, the agent observes the current state (s) of the
environment. This state could represent various factors depending on the problem,
such as the position of a robot in a maze or the current resources in a game.
7. Repeat:
o The agent continues by returning to step 2 and observing the new state,
repeating the learning process until it converges or reaches a stopping
criterion.
1. Initialization:
The algorithm begins by setting up the initial population of solutions for the
scheduling problem. These solutions represent different ways to schedule the jobs
across the multiple factories.
2. Sorting Step:
The solutions in the population are ranked based on a specific criteria, most
likely their makespan (total completion time). The solution with the shortest
makespan is considered the best.
3. Q-Learning Step:
Teacher Phase
Learner Phase
Teacher's Self-Learning Phase
Learner's Self-Learning Phase
Based on the decision from the Q-learning step, one of the four phases is
implemented:
The algorithm checks if a stopping criteria has been met. This criteria might be
a certain number of iterations or achieving a desired makespan.
6. End:
If the stopping condition is met, the algorithm terminates and returns the best
solution found so far, which represents the optimal schedule for the DTHFSP.
7. Loop:
If the stopping condition is not met, the algorithm returns to step 3 and repeats
the process.
3.3.1 Results:
3.3.2 Discussion:
Further research is needed to explore how QTLBO scales to extremely large and
complex distributed networks. Additionally, advancements in explainable AI could improve
transparency in the system's decision-making process.
2) Fuzzy Processing Time Handling: This method can account for uncertain job processing
times, a common issue with new or complex tasks. This flexibility allows for more
realistic scheduling in dynamic environments.
3) Distributed Scheduling Advantage: Designed specifically for distributed two-stage
hybrid flow shops, it can optimize scheduling across multiple factories or production
lines, improving overall production network coordination and resource allocation.
4) Adaptability through Learning: The reinforcement learning aspect of Q-learning allows
the system to learn and adapt its scheduling decisions over time. This is beneficial in
environments where job characteristics, processing times, or machine availability change
frequently.
5) Potential for Continuous Improvement: As the system gathers more data and interacts
with the scheduling environment, it can continuously refine its decision-making,
potentially leading to long-term efficiency gains.
6) Exploration and Exploitation Balance: Q-learning can balance exploration (trying new
scheduling strategies) with exploitation (focusing on proven effective ones). This balance
can help the system discover even better solutions over time.
3.4.2 Disadvantages:
CHAPTER 4
In future works we will attempt to solve the distributed scheduling problem with
uncertainty by using various metaheuristics. Previous works have also mainly used many
kinds of RL algorithms, particularly Q-learning. Related to this, the integration of other RL
algorithms with metaheuristics for production scheduling
REFERENCES
[1] M. Karimi-Mamaghan, M. Mohammadi, P. Meyer, A. M. Karimi-Mamaghan, and E. -
G. Talbi, Machine learning at the service of meta-heuristics for solving combinatorial
optimization problems: A state-of-the-art, Eur. J. Oper. Res., vol. 296, no. 2, pp. 393–422,
2022
[2] J. Wang, D. M. Lei, and J. C. Cai, An adaptive artificial bee colony with
reinforcement learning for distributed three-stage assembly scheduling with maintenance,
Appl. Soft. Comput., vol. 117, p. 108371, 2021.
[3] Wang, Z. X. Pan, and J. J. Wang, A review of reinforcement learning based intelligent
optimization for manufacturing scheduling, Complex. Syst. Mod. Sim., vol. 1, no. 4, pp. 257–
270, 2021.
[4] J. Q. Li, J. K. Li, L. J. Zhang, H. Y. Sang, Y. Y. Han, and Q. D. Chen, Solving type-2
fuzzy distributed hybrid flowshop scheduling using an improved brain storm optimization
algorithm, Int. J. Fuzzy Syst., vol. 23, pp. 1194–1212, 2021.
[5] Z. S. Shao, W. S. Shao, and D. C. Pi, Effective heuristics and metaheuristics for the
distributed fuzzy blocking flowshop scheduling problem, Swarm and EComput., vol. 59, p.
100747, 2020.
[6] J. Wang, X. D. Wang, F. Chu, and J. B. Yu, An energyefficient two-stage hybrid flow
shop scheduling problem in a glass production, Int. J. Prod. Res., vol. 58, no. 8, pp. 2283–
2314, 2020.
[8] B. Fan, W. Yang, and Z. Zhang, Solving the two-stage hybrid flow shop scheduling
problem based on mutant firefly algorithm, J. Amb. Intel. Hum. Comp., vol. 10, no. 3, pp.
979– 990, 2019.