Abstract: In recent years, maintenance optimization has attracted special attention due to the growth of industrial systems complexity. Maintenance costs are high for many systems, and preventive maintenance is effective when it increases operations' reliability and safety at a reduced cost. The novelty of this research is to consider general repair in the modeling of multi-unit series systems and solve the maintenance problem for such systems using the semi-Markov decision process (SMDP) framework. We propose an opportunistic maintenance policy for a series system composed of two main units. Unit 1, which is more expensive than unit 2, is subjected to condition monitoring, and its deterioration is modeled using a gamma process. Unit 1 hazard rate is estimated by the proportional hazards model (PHM), and two hazard rate control limits are considered as the thresholds of maintenance interventions for unit 1. Maintenance is performed on unit 2, considering an age control limit. The objective is to find the optimal control limits and minimize the long-run expected average cost per unit time. The proposed algorithm is applied to a numerical example to compare the effectiveness of the proposed policy (policy Ⅰ) with policy Ⅱ, which is similar to policy Ⅰ, but instead of general repair, replacement is performed. Results show that policy Ⅰ leads to lower average cost compared with policy Ⅱ.
Abstract: In this paper, the joint optimization of the
economic manufacturing quantity (EMQ), safety stock level,
and condition-based maintenance (CBM) is presented for a partially
observable, deteriorating system subject to random failure. The
demand is stochastic and it is described by a Poisson process.
The stochastic model is developed and the optimization problem
is formulated in the semi-Markov decision process framework. A
modification of the policy iteration algorithm is developed to find
the optimal policy. A numerical example is presented to compare
the optimal policy with the policy considering zero safety stock.
Abstract: In this paper, we propose a condition-based
maintenance policy for multi-unit systems considering the
existence of economic dependency among units. We consider a
system composed of N identical units, where each unit deteriorates
independently. Deterioration process of each unit is modeled as a
three-state continuous time homogeneous Markov chain with two
working states and a failure state. The average production rate of
units varies in different working states and demand rate of the
system is constant. Units are inspected at equidistant time epochs,
and decision regarding performing maintenance is determined by the
number of units in the failure state. If the total number of units in the
failure state exceeds a critical level, maintenance is initiated, where
units in failed state are replaced correctively and deteriorated state
units are maintained preventively. Our objective is to determine the
optimal number of failed units to initiate maintenance minimizing
the long run expected average cost per unit time. The problem is
formulated and solved in the semi-Markov decision process (SMDP)
framework. A numerical example is developed to demonstrate the
proposed policy and the comparison with the corrective maintenance
policy is presented.
Abstract: This paper presents a maintenance policy for a system
consisting of two units. Unit 1 is gradually deteriorating and is
subject to soft failure. Unit 2 has a general lifetime distribution
and is subject to hard failure. Condition of unit 1 of the system
is monitored periodically and it is considered as failed when its
deterioration level reaches or exceeds a critical level N. At the
failure time of unit 2 system is considered as failed, and unit 2
will be correctively replaced by the next inspection epoch. Unit 1
or 2 are preventively replaced when deterioration level of unit 1
or age of unit 2 exceeds the related preventive maintenance (PM)
levels. At the time of corrective or preventive replacement of unit
2, there is an opportunity to replace unit 1 if its deterioration
level reaches the opportunistic maintenance (OM) level. If unit
2 fails in an inspection interval, system stops operating although
unit 1 has not failed. A mathematical model is derived to find
the preventive and opportunistic replacement levels for unit 1 and
preventive replacement age for unit 2, that minimize the long run
expected average cost per unit time. The problem is formulated and
solved in the semi-Markov decision process (SMDP) framework.
Numerical example is provided to illustrate the performance of the
proposed model and the comparison of the proposed model with an
optimal policy without opportunistic maintenance level for unit 1 is
carried out.
Abstract: In this paper, we present a model and an algorithm for
the calculation of the optimal control limit, average cost, sample size,
and the sampling interval for an optimal Bayesian chart to control
the proportion of defective items produced using a semi-Markov
decision process approach. Traditional p-chart has been widely
used for controlling the proportion of defectives in various kinds
of production processes for many years. It is well known that
traditional non-Bayesian charts are not optimal, but very few optimal
Bayesian control charts have been developed in the literature, mostly
considering finite horizon. The objective of this paper is to develop
a fast computational algorithm to obtain the optimal parameters of a
Bayesian p-chart. The decision problem is formulated in the partially
observable framework and the developed algorithm is illustrated by
a numerical example.
Abstract: The Markov decision process (MDP) based
methodology is implemented in order to establish the optimal
schedule which minimizes the cost. Formulation of MDP problem
is presented using the information about the current state of pipe,
improvement cost, failure cost and pipe deterioration model. The
objective function and detailed algorithm of dynamic programming
(DP) are modified due to the difficulty of implementing the
conventional DP approaches. The optimal schedule derived from
suggested model is compared to several policies via Monte
Carlo simulation. Validity of the solution and improvement in
computational time are proved.
Abstract: This paper presents a new problem solving approach
that is able to generate optimal policy solution for finite-state
stochastic sequential decision-making problems with high data
efficiency. The proposed algorithm iteratively builds and improves
an approximate Markov Decision Process (MDP) model along with
cost-to-go value approximates by generating finite length trajectories
through the state-space. The approach creates a synergy between an
approximate evolving model and approximate cost-to-go values to
produce a sequence of improving policies finally converging to the
optimal policy through an intelligent and structured search of the
policy space. The approach modifies the policy update step of the
policy iteration so as to result in a speedy and stable convergence to
the optimal policy. We apply the algorithm to a non-holonomic
mobile robot control problem and compare its performance with
other Reinforcement Learning (RL) approaches, e.g., a) Q-learning,
b) Watkins Q(λ), c) SARSA(λ).
Abstract: Many Wireless Sensor Network (WSN) applications necessitate secure multicast services for the purpose of broadcasting delay sensitive data like video files and live telecast at fixed time-slot. This work provides a novel method to deal with end-to-end delay and drop rate of packets. Opportunistic Routing chooses a link based on the maximum probability of packet delivery ratio. Null Key Generation helps in authenticating packets to the receiver. Markov Decision Process based Adaptive Scheduling algorithm determines the time slot for packet transmission. Both theoretical analysis and simulation results show that the proposed protocol ensures better performance in terms of packet delivery ratio, average end-to-end delay and normalized routing overhead.
Abstract: Markov games are a generalization of Markov
decision process to a multi-agent setting. Two-player zero-sum
Markov game framework offers an effective platform for designing
robust controllers. This paper presents two novel controller design
algorithms that use ideas from game-theory literature to produce
reliable controllers that are able to maintain performance in presence
of noise and parameter variations. A more widely used approach for
controller design is the H∞ optimal control, which suffers from high
computational demand and at times, may be infeasible. Our approach
generates an optimal control policy for the agent (controller) via a
simple Linear Program enabling the controller to learn about the
unknown environment. The controller is facing an unknown
environment, and in our formulation this environment corresponds to
the behavior rules of the noise modeled as the opponent. Proposed
controller architectures attempt to improve controller reliability by a
gradual mixing of algorithmic approaches drawn from the game
theory literature and the Minimax-Q Markov game solution
approach, in a reinforcement-learning framework. We test the
proposed algorithms on a simulated Inverted Pendulum Swing-up
task and compare its performance against standard Q learning.
Abstract: Many exist studies always use Markov decision
processes (MDPs) in modeling optimal route choice in
stochastic, time-varying networks. However, taking many
variable traffic data and transforming them into optimal route
decision is a computational challenge by employing MDPs in
real transportation networks. In this paper we model finite
horizon MDPs using directed hypergraphs. It is shown that the
problem of route choice in stochastic, time-varying networks
can be formulated as a minimum cost hyperpath problem, and
it also can be solved in linear time. We finally demonstrate the
significant computational advantages of the introduced
methods.
Abstract: Considering a reservoir with periodic states and
different cost functions with penalty, its release rules can be
modeled as a periodic Markov decision process (PMDP). First,
we prove that policy- iteration algorithm also works for the
PMDP. Then, with policy- iteration algorithm, we obtain the
optimal policies for a special aperiodic reservoir model with
two cost functions under large penalty and give a discussion
when the penalty is small.