Optimal Production and Maintenance Policy for a Partially Observable Production System with Stochastic Demand

In this paper, the joint optimization of the economic manufacturing quantity (EMQ), safety stock level, and condition-based maintenance (CBM) is presented for a partially observable, deteriorating system subject to random failure. The demand is stochastic and it is described by a Poisson process. The stochastic model is developed and the optimization problem is formulated in the semi-Markov decision process framework. A modification of the policy iteration algorithm is developed to find the optimal policy. A numerical example is presented to compare the optimal policy with the policy considering zero safety stock.

Off-Policy Q-learning Technique for Intrusion Response in Network Security

With the increasing dependency on our computer devices, we face the necessity of adequate, efficient and effective mechanisms, for protecting our network. There are two main problems that Intrusion Detection Systems (IDS) attempt to solve. 1) To detect the attack, by analyzing the incoming traffic and inspect the network (intrusion detection). 2) To produce a prompt response when the attack occurs (intrusion prevention). It is critical creating an Intrusion detection model that will detect a breach in the system on time and also challenging making it provide an automatic and with an acceptable delay response at every single stage of the monitoring process. We cannot afford to adopt security measures with a high exploiting computational power, and we are not able to accept a mechanism that will react with a delay. In this paper, we will propose an intrusion response mechanism that is based on artificial intelligence, and more precisely, reinforcement learning techniques (RLT). The RLT will help us to create a decision agent, who will control the process of interacting with the undetermined environment. The goal is to find an optimal policy, which will represent the intrusion response, therefore, to solve the Reinforcement learning problem, using a Q-learning approach. Our agent will produce an optimal immediate response, in the process of evaluating the network traffic.This Q-learning approach will establish the balance between exploration and exploitation and provide a unique, self-learning and strategic artificial intelligence response mechanism for IDS.

Optimal Opportunistic Maintenance Policy for a Two-Unit System

This paper presents a maintenance policy for a system consisting of two units. Unit 1 is gradually deteriorating and is subject to soft failure. Unit 2 has a general lifetime distribution and is subject to hard failure. Condition of unit 1 of the system is monitored periodically and it is considered as failed when its deterioration level reaches or exceeds a critical level N. At the failure time of unit 2 system is considered as failed, and unit 2 will be correctively replaced by the next inspection epoch. Unit 1 or 2 are preventively replaced when deterioration level of unit 1 or age of unit 2 exceeds the related preventive maintenance (PM) levels. At the time of corrective or preventive replacement of unit 2, there is an opportunity to replace unit 1 if its deterioration level reaches the opportunistic maintenance (OM) level. If unit 2 fails in an inspection interval, system stops operating although unit 1 has not failed. A mathematical model is derived to find the preventive and opportunistic replacement levels for unit 1 and preventive replacement age for unit 2, that minimize the long run expected average cost per unit time. The problem is formulated and solved in the semi-Markov decision process (SMDP) framework. Numerical example is provided to illustrate the performance of the proposed model and the comparison of the proposed model with an optimal policy without opportunistic maintenance level for unit 1 is carried out.

Trajectory-Based Modified Policy Iteration

This paper presents a new problem solving approach that is able to generate optimal policy solution for finite-state stochastic sequential decision-making problems with high data efficiency. The proposed algorithm iteratively builds and improves an approximate Markov Decision Process (MDP) model along with cost-to-go value approximates by generating finite length trajectories through the state-space. The approach creates a synergy between an approximate evolving model and approximate cost-to-go values to produce a sequence of improving policies finally converging to the optimal policy through an intelligent and structured search of the policy space. The approach modifies the policy update step of the policy iteration so as to result in a speedy and stable convergence to the optimal policy. We apply the algorithm to a non-holonomic mobile robot control problem and compare its performance with other Reinforcement Learning (RL) approaches, e.g., a) Q-learning, b) Watkins Q(λ), c) SARSA(λ).

Optimizing Dialogue Strategy Learning Using Learning Automata

Modeling the behavior of the dialogue management in the design of a spoken dialogue system using statistical methodologies is currently a growing research area. This paper presents a work on developing an adaptive learning approach to optimize dialogue strategy. At the core of our system is a method formalizing dialogue management as a sequential decision making under uncertainty whose underlying probabilistic structure has a Markov Chain. Researchers have mostly focused on model-free algorithms for automating the design of dialogue management using machine learning techniques such as reinforcement learning. But in model-free algorithms there exist a dilemma in engaging the type of exploration versus exploitation. Hence we present a model-based online policy learning algorithm using interconnected learning automata for optimizing dialogue strategy. The proposed algorithm is capable of deriving an optimal policy that prescribes what action should be taken in various states of conversation so as to maximize the expected total reward to attain the goal and incorporates good exploration and exploitation in its updates to improve the naturalness of humancomputer interaction. We test the proposed approach using the most sophisticated evaluation framework PARADISE for accessing to the railway information system.

Optimal Maintenance Policy for a Partially Observable Two-Unit System

In this paper, we present a maintenance model of a two-unit series system with economic dependence. Unit#1 which is considered to be more expensive and more important, is subject to condition monitoring (CM) at equidistant, discrete time epochs and unit#2, which is not subject to CM has a general lifetime distribution. The multivariate observation vectors obtained through condition monitoring carry partial information about the hidden state of unit#1, which can be in a healthy or a warning state while operating. Only the failure state is assumed to be observable for both units. The objective is to find an optimal opportunistic maintenance policy minimizing the long-run expected average cost per unit time. The problem is formulated and solved in the partially observable semi-Markov decision process framework. An effective computational algorithm for finding the optimal policy and the minimum average cost is developed, illustrated by a numerical example.

Optimal Policy for a Deteriorating Inventory Model with Finite Replenishment Rate and with Price Dependant Demand Rate and Cycle Length Dependant Price

In this paper, an inventory model with finite and constant replenishment rate, price dependant demand rate, time value of money and inflation, finite time horizon, lead time and exponential deterioration rate and with the objective of maximizing the present worth of the total system profit is developed. Using a dynamic programming based solution algorithm, the optimal sequence of the cycles can be found and also different optimal selling prices, optimal order quantities and optimal maximum inventories can be obtained for the cycles with unequal lengths, which have never been done before for this model. Also, a numerical example is used to show accuracy of the solution procedure.