Optimal Production and Maintenance Policy for a Partially Observable Production System with Stochastic Demand

In this paper, the joint optimization of the economic manufacturing quantity (EMQ), safety stock level, and condition-based maintenance (CBM) is presented for a partially observable, deteriorating system subject to random failure. The demand is stochastic and it is described by a Poisson process. The stochastic model is developed and the optimization problem is formulated in the semi-Markov decision process framework. A modification of the policy iteration algorithm is developed to find the optimal policy. A numerical example is presented to compare the optimal policy with the policy considering zero safety stock.

Trajectory-Based Modified Policy Iteration

This paper presents a new problem solving approach that is able to generate optimal policy solution for finite-state stochastic sequential decision-making problems with high data efficiency. The proposed algorithm iteratively builds and improves an approximate Markov Decision Process (MDP) model along with cost-to-go value approximates by generating finite length trajectories through the state-space. The approach creates a synergy between an approximate evolving model and approximate cost-to-go values to produce a sequence of improving policies finally converging to the optimal policy through an intelligent and structured search of the policy space. The approach modifies the policy update step of the policy iteration so as to result in a speedy and stable convergence to the optimal policy. We apply the algorithm to a non-holonomic mobile robot control problem and compare its performance with other Reinforcement Learning (RL) approaches, e.g., a) Q-learning, b) Watkins Q(λ), c) SARSA(λ).