You are on page 1of 6

2018 IEEE 14th International Conference on Automation Science and Engineering (CASE)

Munich, Germany, August 20-24, 2018

Real-time Control of Maintenance on Deteriorating Manufacturing


System
Jing Huang, Student Member, IEEE, Qing Chang*, Member, IEEE, Jing Zou, Student Member, IEEE,
Jorge Arinez, Member, IEEE, and Guoxian Xiao, Member, IEEE

Abstract—In modern manufacturing systems, maintenance degrees. A perfect maintenance action recovers the machine
operations are the key to improve machines’ reliability and to the status of “as good as new”. While a minimal
availability, and hence to improve system productivity and maintenance action only resumes the functionality of the
quality. Maintenance can be roughly categorized to Corrective
Maintenance (CM) and Preventive Maintenance (PM). Since the
machine without changing its aging status. An imperfect
production system is highly stochastic and maintenance actions maintenance recovers the machine to an extent between the
could either be perfect or imperfect, it is a complex decision to two. Based on these maintenance types, a lot of maintenance
make on when, where, and which type of maintenance action policies have been proposed for a single machine or a single-
should be taken. In this paper, a maintenance control law is unit system [2]. For instance, Liao et al. [3] used a search
proposed to schedule cost-effective maintenance, either CM or algorithm to determine a reliability threshold, at which a
PM, in a real-time fashion. The control cost consists of a resource
cost, and the immediate and potential production losses due to
preventive maintenance action will be taken, thus maximizing
the stoppage caused by the maintenance action. A data-driven the availability of the machine. These studies provide us with
model is used to evaluate the production losses. A case study is great insights about the machine reliability analysis.
performed to demonstrate the effectiveness of the proposed However, as modern manufacturing systems are
control method by comparing three different maintenance characterized by their complex structures of strongly
policies. interconnected machines and stochastic dynamics, existing
single-machine maintenance policies cannot be directly
Index Terms—Maintenance management, Deteriorating
system, Real-time control, Opportunity window, Data-driven applied to machines in a multi-stage production line. These
modeling machines are interconnected with each other, and both the
upstream and downstream impacts need to be fully examined
[4]. As a result, any maintenance decision, either CM or PM,
I. INTRODUCTION
should consider not only its effect on the single machine’s
Most of the manufacturing systems are subject to degradation, but also the impact on the overall system.
deteriorations due to usage and aging of machines and To address the complexity, some simulation-based studies
operations in real industry practices. Upon random failures, have been conducted to find the optimal maintenance
machines will stop working unexpectedly, forcing the plant schedule, especially for PM [5]. Based on simulation, Roux et
management to react to the disruption passively. Such reactive
al. [6] optimized the interval of periodic PM to ensure a low
action is corrective maintenance (CM). The outcomes and
costs of such unexpected failures are very hard to predict and level of failures and minimize the unavailability of the system.
sometimes could be far beyond control. To avoid such One drawback of this approach is that in multi-stage
uncontrollable situations, it is essential to conduct preventive manufacturing systems, the temporary stoppage of one
maintenance (PM) before the failure actually happens. machine doesn’t necessarily lead to production losses. Arab
Effective and timely maintenance decision making is et al. [7] addressed this issue by incorporating remaining
definitely not a trivial matter in improving the competitiveness reliability of machines and work-in-process inventories into
of a manufacturing organization. the simulation model to search for the optimal maintenance
Optimal maintenance policies have been intensively schedule. However, intensive computing resource is required
investigated during the past decades. Researchers have found to search in the huge solution space, especially when the
that in reality, the maintenance action, either CM or PM, is system is scaled up. Changes on the process and equipment,
not necessarily a complete replacement [1]. Three different which are norm in today’s manufacturing industry, will lead
maintenance types are defined according to the improvement to corresponding changes and reconstructions in the
simulation models, and subsequently a higher resource
Research supported by the National Science Foundation Grants CMMI consumption. Therefore, a systematic method is desired to
1351160 and 1435534.
manage both CM and PM, through careful analysis of the
J. Huang, Q. Chang* and J. Zou are with the Department of Mechanical
Engineering, Stony Brook University, Stony Brook, NY 11794, USA maintenance function and production dynamics [8].
(email: jing.huang.3@stonybrook.edu, qing.chang@stonybrook.edu; However, in respect of the production analysis, most of the
jing.zou@stonybrook.edu) existing methods are focused on steady-state performance of
J. Arinez and G. Xiao are with General Motors Research and
Development Center, Warren, MI 48090, USA (email: the system, which could not be used to guide maintenance
jorge.arinez@gm.com; guoxiao.xiao@gm.com) decision-making in a real-time situation. Consequently, in
*Corresponding author industrial practice, the plant management has to schedule PM

978-1-5386-3593-3/18/$31.00 ©2018 IEEE 211


during non-production shifts, breaks, or other scheduled II. SYSTEM MODELING
downtime, which might incur unnecessary extra labor and
A. Notations and assumptions
overhead costs. The PM for machines, on the other hand, will
not be available until these breaks, resulting in a probability The following notations are used in this paper.
of missing the best timing for maintenance and the existence  𝑆𝑖 denotes the 𝑖 𝑡ℎ machine, where 1 ≤ 𝑖 ≤ 𝑀
of under-maintained or over-maintained machines.
 𝐵𝑖 denotes the 𝑖 𝑡ℎ buffer, where 2 ≤ 𝑖 ≤ 𝑀 . With
In order to resolve these issues discussed above, recently
abuse of notations, 𝐵𝑖 is also used to denote the
researchers explored the hidden opportunities for preventive maximum capacity of the each buffer.
maintenance during normal production time. Some studies
omitted the buffers between machines and proposed that a  𝑇𝑖 denotes the cycle time of machine 𝑆𝑖
downtime caused by one machine could be used to perform  𝑀∗ denotes the index of last slowest machine in the line
preventive maintenance on the other non-failed machines [9],
[10]. Gu et al. [11] proposed that failure-induced starvation or  𝑏𝑖 (𝑡) denotes the buffer level of buffer 𝐵𝑖 at time t
blockage time could be an opportunity for maintenance.  𝑂𝑊𝑖 (𝑡) in the opportunity window of machine 𝑆𝑖 at
These opinions were originated from direct observations time 𝑡
rather than the underlying characteristics of the system, and
such opportunities could be very limited, especially in a well-  𝑐𝑝 is the profit per part
maintained system. Chang et al. [12] found that a downtime  𝑚 = 1𝑝, 2𝑝, 1𝑐, 2𝑐, 3𝑐 denotes the maintenance type,
event brings permanent production losses only when it where 1𝑝 and 2𝑝 denote perfect and imperfect
impedes (blocks or starves) the slowest machine in the line preventive maintenance respectively; 1𝑐, 2𝑐 𝑎𝑛𝑑 3𝑐
and opportunity window of one machine is the largest denote perfect, imperfect, and minimal corrective
possible stoppage time that will not incur this kind of maintenance respectively
permanent production losses. With the available real-time  𝑒⃗𝑖𝑗 = (𝑖, 𝑚𝑖𝑗 , 𝑡𝑖𝑗 , 𝑑𝑖𝑗 ) , 𝑖 = 1, 2, … 𝑀 , , 𝑗 = 1, 2, … ,
data collected by distributed sensors, Zou et al. [13] was able denotes the 𝑗𝑡ℎ maintenance action on machine 𝑆𝑖 ,
to develop a data-driven model for production system analysis, which starts from time 𝑡𝑖𝑗 and lasts for 𝑑𝑖𝑗 units of time
in which the cost analysis of downtime events was fully and the maintenance type is 𝑚𝑖𝑗
established. Further, this modeling method and concept of
opportunity window have been developed to address issues in  𝑃𝐿𝑖𝑗 and 𝑃𝑃𝐿𝑖𝑗 are the immediate production loss and
manufacturing systems, including energy saving and gantry potential production loss incurred by the maintenance
assignment [14]–[16]. However, these works placed their action 𝑒⃗𝑖𝑗 respectively
emphasis on production dynamics modeling. There is still a  𝐹𝑃𝐿𝑖𝑗 is the production loss of first downtime event
notable gap between maintenance decision-making and after maintenance action 𝑒⃗𝑖𝑗
production system analysis.
This paper is devoted to bridge the gap between machines’  𝜂 is a discount rate that relates 𝑃𝑃𝐿𝑖𝑗 to 𝐹𝑃𝐿𝑖𝑗
reliability and production system dynamics, and facilitating
 Δ𝑡 is a look-ahead window used to predict the
the real-time integrated production and maintenance decision-
production loss of the first subsequent downtime event
making in complex manufacturing systems. In this paper, a
novel real-time maintenance control law is developed by  𝐶𝑟𝑒𝑠
𝑚
, 𝑚 = 1𝑝, 2𝑝, 1𝑐, 2𝑐, 3𝑐 denotes the resource cost
integrally considering real-time production status and of a maintenance action of each type
maintenance cost of various maintenance types. The  𝑑𝑚 , 𝑚 = 1𝑝, 2𝑝, 1𝑐, 2𝑐, 3𝑐 denotes the duration of
machines’ reliability and availability vary with different types maintenance action of each type
of maintenance such as perfect maintenance, imperfect
maintenance or minimal maintenance. Such variance will in  𝑞𝑚 , 𝑚 = 1𝑝, 2𝑝, 1𝑐, 2𝑐, 3𝑐 denotes the improvement
factor of each maintenance type
turn impact production system real-time status and is also
considered in the overall control function.  𝛼𝑖 and 𝛽𝑖 , 𝑖 = 1, 2, … , 𝑀 denotes the Weibull
The remainder of this paper is organized as follows. In distribution parameters of machine 𝑆𝑖
Section II, we illustrate system assumptions and introduce
production system modeling and maintenance modeling. In  𝑍𝑖𝑗 denotes the lifetime of machine 𝑆𝑖 after 𝑗𝑡ℎ
Section III, a control cost function is derived through the maintenance action
analysis of maintenance action and its impact on the overall  𝑣𝑖𝑗 denotes the virtual age of machine 𝑆𝑖 immediately
system. In Section IV, the real-time maintenance control law after its 𝑗𝑡ℎ maintenance action
is proposed. A case study is implemented and demonstrated in
Section V. Conclusions and future works are presented in  𝑔𝑖 (𝑡) denotes the age of machine 𝑆𝑖 at time 𝑡
Section VI.
We make the following assumptions in this paper:
1) Each machine in the production line is a single
deteriorating unit and the failure time follows Weibull

212
distribution. The failure rates of the machines are with time until it receives a preventive maintenance or breaks
mutually independent and increasing with time. down and a corrective maintenance has to be imposed. As Fig.
2) Corrective maintenance (CM) is taken upon failure; 2 shows, the machine receives its 𝑗𝑡ℎ maintenance (either CM
preventive maintenance (PM) is taken when the or PM) at time 𝑡𝑖𝑗 with a duration of 𝑑𝑖𝑗 , and at time 𝑡𝑖𝑗 + 𝑑𝑖𝑗
machine is still operational to prevent future failures. the machine resumed operation.
3) A perfect maintenance action will recover the machine
“as good as new”. A minimal maintenance action will
only resume the machine without changing its aging
status.
4) An imperfect maintenance is recovering the machine
to somewhere between old and new. The recovery
effect is not stochastic and can be described by a
deterministic improvement factor.
5) The duration of each maintenance type is assumed to
be deterministic.
6) CM could either be perfect, imperfect or minimal; PM
could either be perfect or imperfect.
B. Production system model Figure 2. Maintenance actions on machine 𝑆𝑖

A serial production line consists of 𝑀 machines and 𝑀 − 1


Immediately after the 𝑗𝑡ℎ maintenance action, the machine
buffers (Fig. 1). Each machine has a rated speed 1/𝑇𝑖 and
gains a virtual age 𝑣𝑖𝑗 depending on both the maintenance
each buffer has a finite capacity.
type and the age before the maintenance action[1]. Let 𝑞𝑚𝑖𝑗
be the improvement factor of the 𝑗𝑡ℎ maintenance action, then
𝑣𝑖𝑗 = 𝑞𝑚𝑖𝑗 (𝑣𝑖(𝑗−1) + 𝑍𝑖(𝑗−1) ) (3)
where 𝑣𝑖(𝑗−1) is the virtual age of machine 𝑆𝑖 immediately
Figure 1. System structure of the serial production line
after (𝑗 − 1)𝑡ℎ maintenance action, 𝑍𝑖(𝑗−1) is its survival time
after (𝑗 − 1)𝑡ℎ maintenance action, and the summation
A data-driven modeling method for the manufacturing
𝑣𝑖(𝑗−1) + 𝑍𝑖(𝑗−1) is the age of machine 𝑆𝑖 when it is about to
system is used in this paper[14]. The manufacturing system is
modeled as a stochastic dynamic system: receive 𝑗𝑡ℎ action. Obviously, 𝑞𝑚𝑖𝑗 = 0 corresponds to a
𝑿̇ = 𝑭(𝑿(𝑡), 𝑼(𝑡), 𝑾(𝑡)) (1) perfect maintenance since the virtual age is reduced to zero,
where the state 𝑿(𝑡) ∈ 𝑅𝑀 is the production count of each 𝑞𝑚𝑖𝑗 = 1 corresponds to a minimal maintenance and 0 <
machine up to time 𝑡 , 𝑼(t) is the control input, i.e. the 𝑞𝑚𝑖𝑗 < 1 relates to an imperfect maintenance.
actively shutdown to perform preventive maintenance, and At any time 𝑡 after the maintenance action, the age 𝑔𝑖 (𝑡) of
𝑾(𝑡) is the random failures of the machines. Given the machine 𝑆𝑖 is a summation of its virtual age and actual
control input and random failures of the system, the system surviving time.
state can be calculated recursively based on our previous 𝑔𝑖 (𝑡) = 𝑣𝑖𝑗 + [𝑡 − (𝑡𝑖𝑗 + 𝑑𝑖𝑗 )], 𝑡 ≥ 𝑡𝑖𝑗 + 𝑑𝑖𝑗 (4)
studies[14]. In addition, the buffer level of a certain buffer ∗
Let 𝑝𝑖 (𝑡, 𝑡 ) denotes the probability density function of the
𝐵𝑖+1 at any time 𝑡 can be easily evaluated as
time to failure 𝑡 ∗ of machine 𝑆𝑖 evaluated at time 𝑡. It follows
𝑏𝑖+1 (𝑡) = 𝑋𝑖 (𝑡) − 𝑋𝑖+1 (𝑡) + 𝑏𝑖+1 (0) (2)
Weibull distribution conditioning on the age of machine 𝑆𝑖 .
Thus the buffer levels of the production line at time 𝑡 ,
𝑝𝑖 (𝑡, 𝑡 ∗ ) = 𝑝𝑖 (𝑡 ∗ |𝑔𝑖 (𝑡)), 𝑡 ≥ 𝑡𝑖𝑗 + 𝑑𝑖𝑗 , 𝑡 ∗ ≥ 0 (5)
denoted as 𝒃(𝑡) = [𝑏2 (𝑡), 𝑏3 (𝑡), … , 𝑏𝑀 (𝑡)] are obtained.
More specifically
C. Maintenance modeling 𝛽𝑖 𝑡 ∗ +𝑔𝑖 (𝑡) 𝛽𝑖 −1 𝑡 ∗ +𝑔𝑖 (𝑡) 𝛽𝑖
𝑝𝑖 (𝑡, 𝑡 ∗ ) = ( ) exp (− ( ) +
The reliability of machine 𝑆𝑖 is described by a Weibull 𝛼𝑖 𝛼𝑖 𝛼𝑖

model with parameter 𝛼𝑖 and 𝛽𝑖 . 𝛼𝑖 is known as the 𝑔𝑖 (𝑡) 𝛽𝑖


( ) ) , 𝑡 ≥ 𝑡𝑖𝑗 + 𝑑𝑖𝑗 , 𝑡 ∗ ≥ 0 (6)
𝛼𝑖
characteristic lifetime or scale parameter, while 𝛽𝑖 is known
as the shape parameter. The shape parameter 𝛽𝑖 determines III. CONTROL COST FUNCTION
the aging mode of the machine. When 𝛽𝑖 = 1, the machine is
A. Control cost function
not aging with time at all, since the distribution is simplified
as an exponential distribution. When 𝛽𝑖 > 1, the machine is To determine when, where and what type of maintenance
aging with time, i.e. the machine is more likely to fail as time actions should be take on to minimize the overall system cost,
increasing. The latter case is of interest in this paper. a control cost function is desirable for maintenance control.
The failure rate of machine 𝑆𝑖 is continuously increasing In this paper, we focus on maintenance action, and the cost of
𝑗𝑡ℎ maintenance action on machine 𝑆𝑖 is given as

213
𝑚𝑖𝑗
𝐶𝑖𝑗 = 𝐶𝑟𝑒𝑠 + 𝑐𝑝 (𝑃𝐿𝑖𝑗 + 𝑃𝑃𝐿𝑖𝑗 ) (7) is the opportunity window evaluated at time 𝑡𝑖𝑗 when action
where
𝑚𝑖𝑗
𝐶𝑟𝑒𝑠 is the cost of resource used during maintenance, is taken and 𝑇𝑀∗ is the cycle time of the slowest machine 𝑆𝑀∗ .
including part replacement and other consumable expenses. It C. Potential production loss evaluation
varies with the maintenance type. Generally, the resource cost The size of opportunity window heavily relies on the buffer
of a perfect maintenance is larger than an imperfect levels (if upstream) or buffer vacancies (if downstream)
maintenance, and the resource cost of an imperfect between the machine of interest and the slowest machine. The
maintenance is larger than that of a minimal one. The second opportunity window accumulates when the machine operates
term is the profit loss due to the maintenance action that faster than the slowest machine, and shrinks when the
causes unavailability of the machine. 𝑃𝐿𝑖𝑗 and 𝑃𝑃𝐿𝑖𝑗 are the machine stops while the slowest machine operates.
immediate production loss and future potential production Given a maintenance action 𝑒⃗𝑖𝑗 imposed on machine 𝑆𝑖 , the
loss respectively, and they will be discussed in following stoppage will propagate to nearby machines sequentially
sections. since it will cause blockage or starvation. The opportunity
Since the cost is one-time cost, it is unfair to compare the windows of nearby machines gradually shrink. Considering
costs between different maintenance type options. The cost of subsequent random failures on these machines, the permanent
a perfect maintenance action might be much higher than an production losses may be amplified by the reduced
imperfect maintenance action, but the former one will ensure opportunity windows. In principle, these losses cannot be
the system to operate a longer period of time before failure. directly attributed to the initial action 𝑒⃗𝑖𝑗 , but 𝑒⃗𝑖𝑗 does
Hence a cost rate function 𝐶𝑅𝑖 is given by unifying the cost indirectly contribute to the losses of subsequent failures.
along the expected value of lifetime 𝑍𝑖𝑗 after the action, i.e. Therefore, a maintenance action may alter the health status of
𝑚𝑖𝑗
𝐶𝑟𝑒𝑠 +𝑐𝑝 (𝑃𝐿𝑖𝑗 +𝑃𝑃𝐿𝑖𝑗 ) the overall system to some extent. To incorporate this impact,
𝐶𝑅𝑖 (𝑡) = , 𝑡 ∈ [𝑡𝑖𝑗 , 𝑡𝑖(𝑗+1) ) (8)
𝐸[𝑍𝑖𝑗 ] we refer the difference of future production losses with and
where 𝐸[𝑍𝑖𝑗 ] can be evaluated by the expectation of without 𝑒⃗𝑖𝑗 as potential production loss, denoted as 𝑃𝑃𝐿𝑖𝑗 .
(conditional) Weibull distribution. As the manufacturing system is highly nonlinear and
𝑔𝑖𝑗 𝛽𝑖 ∞ 𝑡 ∗ +𝑔𝑖𝑗 𝛽𝑖 stochastic, it is nearly impossible to have a precise expression
𝐸[𝑍𝑖𝑗 ] = exp (( ) ) ∫0 exp (( ) ) 𝑑𝑡 ∗ (9) for the potential production loss. Referring to Zou et al.[17],
𝛼𝑖 𝛼𝑖

The real-time cost rate of the whole production line is the production losses of the very first random failure event
𝐶𝑅(𝑡) = ∑𝑀 (10) within a short look ahead window Δ𝑡 is taken as an indicator.
𝑖=1 𝐶𝑅𝑖 (𝑡)
This cost rate function is used as control cost function to 𝑃𝑃𝐿𝑖𝑗 = 𝜂 × 𝐹𝑃𝐿𝑖𝑗 (13)
guide maintenance decision making. where 𝐹𝑃𝐿𝑖𝑗 is the production loss incurred by the very first
random failure of the whole line after the maintenance action
B. Immediate production loss evaluation
𝑒⃗𝑖𝑗 . A proper discount ratio 𝜂 can be given based on past
Any maintenance action taken on a machine will require experience.
the specific machine to stop operating for a period of time. A
Suppose that after time 𝑡𝑖𝑗 , the very first random failure 𝑒⃗𝑘∗
stoppage is directly counted toward unavailability time and
occurs on machine 𝑆𝑘 (𝑘 = 1, 2, … , 𝑀) at time 𝑡 ∗ and a
integrated into cost function in most related works. However,
perfect corrective maintenance is taken. Since there is no
Chang et al. found that the stoppage incurs permanent
other random failure between time 𝑒⃗𝑖𝑗 and 𝑒⃗𝑘∗ , the buffer
production loss only if it impedes (blocks or starves) the last
slowest machine in the line[12]. In other words, not all the levels 𝒃(𝑡𝑖𝑗 + 𝑡 ∗ ) can be exactly computed and 𝑂𝑊𝑘 (𝑡𝑖𝑗 +
stoppages cause permanent production loss. 𝑡 ∗ ) can be computed. Let 𝑃𝐿𝑘∗ (𝑡 ∗ ) denotes the production
The largest possible stoppage time of machine 𝑆𝑖 that loss incurred by 𝑒⃗𝑘∗ , then
won’t lead to permanent production loss is referred to as 𝑑1𝑐 −𝑂𝑊𝑘 (𝑡𝑖𝑗+𝑡 ∗ )
𝑃𝐿𝑘∗ (𝑡 ∗ ) = max { , 0} (14)
opportunity window, denoted as 𝑂𝑊𝑖 . 𝑇 𝑀∗
𝑇 The probability associated with 𝑃𝐿𝑘∗ (𝑡 ∗ )
is the probability
𝑂𝑊𝑖 (𝑇𝑑 ) = sup {𝑑 ≥ 0: 𝑠. 𝑡. ∃𝑇 ∗ (𝑑), ∫0 𝑠𝑀 (𝑡)𝑑𝑡 = ∗
𝑇
𝑝(𝑘, 𝑡𝑖𝑗 , 𝑡 ) that the very first random failure occurs on
∫0 𝑠̃ (𝑡; 𝑒⃗)𝑑𝑡 , ∀𝑇 ≥ 𝑇 ∗ (𝑑)} (11) machine 𝑆𝑘 at time 𝑡 ∗ from time 𝑡𝑖𝑗 . The machines are
𝑇 𝑇
where ∫0 𝑠̃𝑀 (𝑡; 𝑒⃗)𝑑𝑡 and are the production
∫0 𝑠𝑀 (𝑡)𝑑𝑡 independent with each other regarding reliability. Therefore
volume of the end-of-line machine 𝑆𝑀 at time 𝑇, with and 𝑝(𝑘, 𝑡𝑖𝑗 , 𝑡 ∗ ) is the joint probability of 𝑀 machines, i.e.
𝑡∗
without disruption event 𝑒⃗ = (𝑖, 𝑚, 𝑡, 𝑑), respectively. 𝑇 ∗ (𝑑) 𝑝(𝑘, 𝑡𝑖𝑗 , 𝑡 ∗ ) = 𝑝𝑘 (𝑡𝑖𝑗 , 𝑡 ∗ ) ∏𝑀
𝑙=1,𝑙≠𝑘 [1 − ∫0 𝑝𝑙 (𝑡𝑖𝑗 , 𝜏)𝑑𝜏 ] (15)
signifies the potential dependency of 𝑇 ∗ on 𝑑.
Finally, 𝐹𝑃𝐿𝑖𝑗 can be evaluated by the expected production
Given a maintenance action 𝑒⃗𝑖𝑗 = (𝑖, 𝑚𝑖𝑗 , 𝑡𝑖𝑗 , 𝑑𝑖𝑗 ) taken on losses.
machine 𝑆𝑖 , the immediate permanent production loss is Δt
𝑑𝑖𝑗 −𝑂𝑊𝑖 (𝑡𝑖𝑗)
𝐹𝑃𝐿𝑖𝑗 = ∑𝑀 ∗ ∗
𝑘=1 ∫0 𝑃𝐿𝑘∗ (𝑡 )𝑝(𝑘, 𝑡𝑖𝑗 , 𝑡 )𝑑𝑡

(16)
𝑃𝐿𝑖𝑗 = max { , 0} (12) where Δ𝑡 is the look-ahead window.
𝑇 𝑀∗
where 𝑑𝑖𝑗 is the duration of the maintenance action, 𝑂𝑊(𝑡𝑖𝑗 )

214
IV. REAL-TIME MAINTENANCE CONTROL LAW machines and 5 buffers (Fig. 2).
The maintenance cost heavily depends on the real-time
state of the production line. However, it is nearly impossible
to completely predict the state at any future time. It is
extremely difficult to find an optimal maintenance control
policy for a global time horizon. A feasible proposal is to
develop a control scheme in real-time fashion to optimize
maintenance cost rate, thus obtaining a near optimal
maintenance schedule. Our control objective is to minimize
the real-time cost rate, i.e. Figure 2. System structure of the serial production line
min ∑𝑁 𝑖 𝐶𝑅𝑖 (𝑡) (17)
𝑚
Let 𝐶𝑅𝑖 (𝑡), 𝑚 = 1𝑝, 2𝑝, 1𝑐, 2𝑐, 3𝑐 denotes the cost rate The system parameters are shown in Table I. The profit per
assuming a maintenance action of type 𝑚 is taken on machine part is assumed to be 𝑐𝑝 = $300/𝑝𝑎𝑟𝑡. In this case study, the
𝑆𝑖 at time 𝑡. Then 𝐶𝑅𝑖𝑚 (𝑡) can be evaluated by inserting a look ahead window for future potential production loss
corresponding maintenance action 𝑒⃗𝑖∗ = (𝑖, 𝑚, 𝑡, 𝑑𝑚 ) into Eq. estimation is Δ𝑡 = 100 𝑚𝑖𝑛 and discount ratio is selected as
(8). The control law incorporates both CM and PM control 𝜂 = 0.3. The time step for deciding preventive maintenance
procedures. is 𝜎 = 25 𝑚𝑖𝑛. Maintenance parameters are given in Table II.
1) CM control procedure The case study is carried out in simulation, and the duration
If machine 𝑆𝑖 fails at time 𝑡, a CM action has to be imposed is four weeks, i.e. 𝑇 = 40320 𝑚𝑖𝑛 . The maintenance
in order to resume the machine. The eligible maintenance parameters are presented in Table II.
types are 1𝑐, 2𝑐, and 3𝑐. Then the maintenance type chosen
is the one that minimize the real-time cost rate of machine 𝑆𝑖 , TABLE I. PARAMETERS FOR THE PRODUCTION LINE
i.e. 𝑆1 𝑆2 𝑆3 𝑆4 𝑆5 𝑆6
𝑚𝑖 = arg min{𝐶𝑅𝑖𝑚 (𝑡), 𝑚 = 1𝑐, 2𝑐, 3𝑐} (18) Cycle time 𝑇𝑖 (𝑚𝑖𝑛) 0.92 0.88 1.0 0.87 0.87 0.92
𝑚
Initial age 𝑣𝑖 (𝑚𝑖𝑛) 100 400 20 300 0 50
2) PM control procedure
A small time step 𝜎 is chosen, which is much smaller than Characteristic life 𝛼𝑖 (𝑚𝑖𝑛) 680 720 900 800 700 750
the lifetimes of machines. If machine 𝑆𝑖 is operational at Shape parameter 𝛽𝑖 2 2 2 2 2 2
current step, then machine 𝑆𝑖 is eligible for a PM. The cost 𝐵2 𝐵3 𝐵4 𝐵5 𝐵6
rate of PM is 𝐶𝑅𝑖𝑚 (𝑡), 𝑚 = 1𝑝, 2𝑝. Buffer capacity 𝐵𝑖 20 20 20 20 20
We can also decide not to take any action on the machine. Initial buffer level 𝑏𝑖 (0) 2 3 5 8 2
Then the machine is threatened to fail at next step with
probability 𝑃𝑖 (𝑡, 𝜎).
𝜎
𝑃𝑖 (𝑡, 𝜎) = ∫0 𝑝𝑖 (𝑡, 𝑡 ∗ )𝑑𝑡 ∗ (19) TABLE II. PARAMETERS FOR MAINTENANCE ACTIONS
The future system state at time 𝑡 + 𝜎 is uncertain. In order 1𝑝 2𝑝 1𝑐 2𝑐 3𝑐
to evaluate the production losses at that time, one can use the Resource cost 𝐶𝑟𝑒𝑠 (𝑈𝑆 𝑑𝑜𝑙𝑙𝑎𝑟) 120 70 200 100 50
worst estimate of opportunity window of machine 𝑆𝑖 at time Duration 𝑑𝑚 (𝑚𝑖𝑛) 15 10 35 20 10
𝑡 + 𝜎 , which is the situation where machine 𝑆𝑖 stops Improvement factor 𝑞𝑚 0 0.7 0 0.7 1
operating for 𝜎 units of time. Thus upon the potential failure
at next time step, we can find the minimum 𝐶𝑅𝑖𝑚 (𝑡 + 𝜎), 𝑚 =
Three policies are compared using simulation, i.e.
1𝑐, 2𝑐, 3𝑐 . Specially, let 𝐶𝑅𝑖0 (𝑡) denotes the expected CM Policy 1: failure limit policy [18]
cost rate if we take no action at current step and machine 𝑆𝑖 A perfect PM action ( 1𝑝 ) will be imposed once the
fails at next step, then machine reaches a pre-determined failure rate threshold 𝜆,
𝐶𝑅𝑖0 (𝑡) = 𝑃𝑖 (𝑡, 𝜎) ⋅ min{𝐶𝑅𝑖𝑚 (𝑡 + 𝜎), 𝑚 = 1𝑐, 2𝑐 ,3𝑐} (20) and failures before that will be corrected with minimal
The maintenance decision 𝑚𝑖 at current step is CM (3𝑐).The threshold is chosen as 𝜆 = 1/400.
𝑚𝑖 = arg min{𝐶𝑅𝑖𝑚 (𝑡), 𝑚 = 1𝑝, 2𝑝, 0} (21) Policy 2: empirical policy
𝑚
Policy 2 is to mimic the common practice in current
V. CASE STUDY industry, where preventive maintenance will only be
In order to demonstrate the effectiveness of the real-time conducted between shifts. Given a failure rate threshold
maintenance control law, a case study is presented to compare 𝜆 = 1/400, a machine will receive perfect PM (1𝑝) if it
reaches the failure limit between shifts. If the machine
the overall system profit under three maintenance policies.
fails during shift, the maintenance action is perfect CM
The overall profit over the whole production horizon 𝑇 is
(1𝑐) if the machine exceeds the failure limit threshold or
roughly calculated as:
minimal CM (3𝑐) if it is under the threshold. The shift
𝑂𝑣𝑒𝑟𝑎𝑙𝑙 𝑃𝑟𝑜𝑓𝑖𝑡 = 𝑐𝑝 ⋅ 𝑋𝑀 (𝑇) − 𝑀𝑎𝑖𝑛𝑡𝑒𝑛𝑎𝑛𝑐𝑒 𝐶𝑜𝑠𝑡𝑠 (22) length is 8 hours and duration of the PM break between
The production line used in this case study consists of 6 shifts is 30 𝑚𝑖𝑛.

215
Policy 3: control law proposed in this paper. simulation model,” Int. J. Prod. Econ., vol. 143, no. 1, pp. 3–12, May
2013.
Since PM is conducted during production time under
[7] A. Arab, N. Ismail, and L. S. Lee, “Maintenance scheduling
Policy 3, there will be no breaks between shifts during the incorporating dynamics of production system and real-time
simulation. This is also one of the major benefits of the control information from workstations,” J. Intell. Manuf., vol. 24, no. 4, pp.
scheme proposed in this paper. 695–705, 2013.
[8] P. Muchiri, L. Pintelon, L. Gelders, and H. Martin, “Development of
The final results of the overall profit from three policies are
maintenance function performance measurement framework and
shown in Table III. It is clear that the overall profit of the indicators,” Int. J. Prod. Econ., vol. 131, no. 1, pp. 295–302, May
production system is well increased with the control proposed 2011.
in this paper. Comparing with Policy 1 and Policy 2, the [9] T. Xia, L. Xi, X. Zhou, and J. Lee, “Dynamic maintenance decision-
making for series-parallel manufacturing system based on MAM-
overall profit has been improved by 25.8% and 15.5%
MTW methodology,” Eur. J. Oper. Res., vol. 221, no. 1, pp. 231–
respectively. The control law proposed for both CM and PM 240, 2012.
in deteriorating production line has increased the overall [10] N. Chalabi, M. Dahane, B. Beldjilali, and A. Neki, “Optimisation of
profit significantly. preventive maintenance grouping strategy for multi-component series
systems: Particle swarm based approach,” Comput. Ind. Eng., vol.
TABLE III. RESULTS FOR MAINTENANCE ACTIONS 102, pp. 440–451, 2016.
[11] X. Gu, X. Jin, and J. Ni, “Prediction of Passive Maintenance
Policy 1 Policy 2 Policy 3
Opportunity Windows on Bottleneck Machines in Complex
Overall profit Manufacturing Systems.,” J. Manuf. Sci. Eng., vol. 137, no. 3, pp.
average 9,156,117.92. 9,980,446.82 11,522,315.49 31017–310179, 2015.
(𝑈𝑆 𝑑𝑜𝑙𝑙𝑎𝑟) [12] Q. Chang, J. Ni, P. Bandyopadhyay, S. Biller, and G. Xiao,
Overall profit “Maintenance Opportunity Planning System,” J. Manuf. Sci. Eng.,
[8988002.12, [9,841,137.85, [11,520,036.96, vol. 129, no. 3, p. 661, 2007.
95% CI
9303020.05] 10,095,282.76] 11,525,556.29] [13] J. Zou, Q. Chang, Y. Lei, and J. Arinez, “Production System
(𝑈𝑠 𝑑𝑜𝑙𝑙𝑎𝑟)
Performance Identification Using Sensor Data,” IEEE Trans. Syst.
Man, Cybern. Syst., vol. 48, no. 2, pp. 255–264, Feb. 2018.
[14] J. Zou, Q. Chang, J. Arinez, and G. Xiao, “Data-driven modeling and
VI. CONCLUSION real-time distributed control for energy efficient manufacturing
systems,” Energy, vol. 127, pp. 247–257, 2017.
In this paper, a real-time control scheme is developed [15] J. Arinez, X. Ou, and Q. Chang, “Gantry Scheduling for Two-
regarding the maintenance decision-making in complex Machine One-Buffer Composite Work Cell by Reinforcement
Learning,” in Volume 4: Bio and Sustainable Manufacturing, 2017,
manufacturing systems. The cost rate function of a p. V004T05A025.
maintenance action is established by incorporating both [16] X. Ou, Q. Chang, N. Chakraborty, and J. Wang, “Gantry Scheduling
resource cost and production losses due to machine stoppage. for Multi-Gantry Production System by Online Task Allocation
Method,” IEEE Robot. Autom. Lett., vol. 2, no. 4, pp. 1848–1855,
The CM decision is made through finding the minimum real- Oct. 2017.
time cost rate. The PM decision is made through comparing [17] J. Zou, Q. Chang, and Y. Lei, “Production Loss Diagnosis and
the cost rate of PM and expected CM cost rate when Prognosis Using Model-based Data-driven Method,” IFAC-
PapersOnLine, vol. 49, no. 12, pp. 1585–1590, 2016.
responding to the potential failure at next time step. This [18] Y. H. Lie, Chang Hoon; Chun, “An Algorithm for Preventive
control law is proved to be effective through case study. Maintenance Policy,” IEEE Trans. Reliab., vol. 35, no. 1, pp. 71–75,
In the future, we will work to consider the stochastic 1986.
durations and effects of maintenance actions. Future work
will take constraints from maintenance staff number and skill
levels into consideration.

ACKNOWLEDGMENT
This work was supported by the U.S National Science
Foundation (NSF) Grant No. CMMI 1351160 and 1435534.

REFERENCES
[1] H. Pham and H. Wang, “Imperfect maintenance,” Eur. J. Oper. Res.,
vol. 94, no. 3, pp. 425–438, 1996.
[2] H. Wang, “A survey of maintenance policies of deteriorating
systems,” Eur. J. Oper. Res., vol. 139, no. 3, pp. 469–489, Jun. 2002.
[3] H. Liao, E. A. Elsayed, and L.-Y. Chan, “Maintenance of
continuously monitored degrading systems,” Eur. J. Oper. Res., vol.
175, no. 2, pp. 821–835, Dec. 2006.
[4] J. Zou, Q. Chang, Y. Lei, G. Xiao, and J. Arinez, “Stochastic
Maintenance Opportunity Windows for Serial Production Line,” in
Volume 2: Materials; Biomanufacturing; Properties, Applications
and Systems; Sustainable Manufacturing, 2015, p. V002T04A007.
[5] A. Alrabghi and A. Tiwari, “State of the art in simulation-based
optimisation for maintenance systems,” Comput. Ind. Eng., vol. 82,
pp. 167–182, Apr. 2015.
[6] O. Roux, D. Duvivier, G. Quesnel, and E. Ramat, “Optimization of
preventive maintenance through a combined maintenance-production

216

You might also like