Abstract: In this paper, the joint optimization of the
economic manufacturing quantity (EMQ), safety stock level,
and condition-based maintenance (CBM) is presented for a partially
observable, deteriorating system subject to random failure. The
demand is stochastic and it is described by a Poisson process.
The stochastic model is developed and the optimization problem
is formulated in the semi-Markov decision process framework. A
modification of the policy iteration algorithm is developed to find
the optimal policy. A numerical example is presented to compare
the optimal policy with the policy considering zero safety stock.
Abstract: With the increasing dependency on our computer
devices, we face the necessity of adequate, efficient and effective
mechanisms, for protecting our network. There are two main
problems that Intrusion Detection Systems (IDS) attempt to solve.
1) To detect the attack, by analyzing the incoming traffic and inspect
the network (intrusion detection). 2) To produce a prompt response
when the attack occurs (intrusion prevention). It is critical creating an
Intrusion detection model that will detect a breach in the system on
time and also challenging making it provide an automatic and with
an acceptable delay response at every single stage of the monitoring
process. We cannot afford to adopt security measures with a high
exploiting computational power, and we are not able to accept a
mechanism that will react with a delay. In this paper, we will
propose an intrusion response mechanism that is based on artificial
intelligence, and more precisely, reinforcement learning techniques
(RLT). The RLT will help us to create a decision agent, who will
control the process of interacting with the undetermined environment.
The goal is to find an optimal policy, which will represent the
intrusion response, therefore, to solve the Reinforcement learning
problem, using a Q-learning approach. Our agent will produce an
optimal immediate response, in the process of evaluating the network
traffic.This Q-learning approach will establish the balance between
exploration and exploitation and provide a unique, self-learning and
strategic artificial intelligence response mechanism for IDS.
Abstract: This paper presents a maintenance policy for a system
consisting of two units. Unit 1 is gradually deteriorating and is
subject to soft failure. Unit 2 has a general lifetime distribution
and is subject to hard failure. Condition of unit 1 of the system
is monitored periodically and it is considered as failed when its
deterioration level reaches or exceeds a critical level N. At the
failure time of unit 2 system is considered as failed, and unit 2
will be correctively replaced by the next inspection epoch. Unit 1
or 2 are preventively replaced when deterioration level of unit 1
or age of unit 2 exceeds the related preventive maintenance (PM)
levels. At the time of corrective or preventive replacement of unit
2, there is an opportunity to replace unit 1 if its deterioration
level reaches the opportunistic maintenance (OM) level. If unit
2 fails in an inspection interval, system stops operating although
unit 1 has not failed. A mathematical model is derived to find
the preventive and opportunistic replacement levels for unit 1 and
preventive replacement age for unit 2, that minimize the long run
expected average cost per unit time. The problem is formulated and
solved in the semi-Markov decision process (SMDP) framework.
Numerical example is provided to illustrate the performance of the
proposed model and the comparison of the proposed model with an
optimal policy without opportunistic maintenance level for unit 1 is
carried out.
Abstract: This paper presents a new problem solving approach
that is able to generate optimal policy solution for finite-state
stochastic sequential decision-making problems with high data
efficiency. The proposed algorithm iteratively builds and improves
an approximate Markov Decision Process (MDP) model along with
cost-to-go value approximates by generating finite length trajectories
through the state-space. The approach creates a synergy between an
approximate evolving model and approximate cost-to-go values to
produce a sequence of improving policies finally converging to the
optimal policy through an intelligent and structured search of the
policy space. The approach modifies the policy update step of the
policy iteration so as to result in a speedy and stable convergence to
the optimal policy. We apply the algorithm to a non-holonomic
mobile robot control problem and compare its performance with
other Reinforcement Learning (RL) approaches, e.g., a) Q-learning,
b) Watkins Q(λ), c) SARSA(λ).
Abstract: Modeling the behavior of the dialogue management in
the design of a spoken dialogue system using statistical methodologies
is currently a growing research area. This paper presents a work
on developing an adaptive learning approach to optimize dialogue
strategy. At the core of our system is a method formalizing dialogue
management as a sequential decision making under uncertainty whose
underlying probabilistic structure has a Markov Chain. Researchers
have mostly focused on model-free algorithms for automating the
design of dialogue management using machine learning techniques
such as reinforcement learning. But in model-free algorithms there
exist a dilemma in engaging the type of exploration versus exploitation.
Hence we present a model-based online policy learning
algorithm using interconnected learning automata for optimizing
dialogue strategy. The proposed algorithm is capable of deriving
an optimal policy that prescribes what action should be taken in
various states of conversation so as to maximize the expected total
reward to attain the goal and incorporates good exploration and
exploitation in its updates to improve the naturalness of humancomputer
interaction. We test the proposed approach using the most
sophisticated evaluation framework PARADISE for accessing to the
railway information system.
Abstract: In this paper, we present a maintenance model of a
two-unit series system with economic dependence. Unit#1 which is
considered to be more expensive and more important, is subject to
condition monitoring (CM) at equidistant, discrete time epochs and
unit#2, which is not subject to CM has a general lifetime distribution.
The multivariate observation vectors obtained through condition
monitoring carry partial information about the hidden state of unit#1,
which can be in a healthy or a warning state while operating. Only the
failure state is assumed to be observable for both units. The objective
is to find an optimal opportunistic maintenance policy minimizing
the long-run expected average cost per unit time. The problem
is formulated and solved in the partially observable semi-Markov
decision process framework. An effective computational algorithm
for finding the optimal policy and the minimum average cost is
developed, illustrated by a numerical example.
Abstract: In this paper, an inventory model with finite and
constant replenishment rate, price dependant demand rate, time
value of money and inflation, finite time horizon, lead time and
exponential deterioration rate and with the objective of maximizing
the present worth of the total system profit is developed. Using a
dynamic programming based solution algorithm, the optimal
sequence of the cycles can be found and also different optimal
selling prices, optimal order quantities and optimal maximum
inventories can be obtained for the cycles with unequal lengths,
which have never been done before for this model. Also, a
numerical example is used to show accuracy of the solution
procedure.