Abstract: The availability of inexpensive, yet competent hardware allows for increased level of automation and self-optimization in the context of Industry 4.0. However, such agents require high quality information about their surroundings along with a robust strategy for collision avoidance, as they may cause expensive damage to equipment or other agents otherwise. Manually defining a strategy to cover all possibilities is both time-consuming and counter-productive given the capabilities of modern hardware. This paper explores the idea of a model-free self-optimizing obstacle avoidance strategy for multiple autonomous agents in a simulated dynamic environment using the Q-learning algorithm.
Abstract: Current trends in remote health monitoring to monetize on the Internet of Things applications have been raised in efficient and interference free communications in Wireless Body Area Network (WBAN) scenario. Co-existence interference in WBANs have aggravates the over-congested radio bands, thereby requiring efficient Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA) strategies and improve interference management. Existing solutions utilize simplistic heuristics to approach interference problems. The scope of this research article is to investigate reinforcement learning for efficient interference management under co-existing scenarios with an emphasis on homogenous interferences. The aim of this paper is to suggest a smart CSMA/CA mechanism based on reinforcement learning called QIM-MAC that effectively uses sense slots with minimal interference. Simulation results are analyzed based on scenarios which show that the proposed approach maximized Average Network Throughput and Packet Delivery Ratio and minimized Packet Loss Ratio, Energy Consumption and Average Delay.
Abstract: With the increasing dependency on our computer
devices, we face the necessity of adequate, efficient and effective
mechanisms, for protecting our network. There are two main
problems that Intrusion Detection Systems (IDS) attempt to solve.
1) To detect the attack, by analyzing the incoming traffic and inspect
the network (intrusion detection). 2) To produce a prompt response
when the attack occurs (intrusion prevention). It is critical creating an
Intrusion detection model that will detect a breach in the system on
time and also challenging making it provide an automatic and with
an acceptable delay response at every single stage of the monitoring
process. We cannot afford to adopt security measures with a high
exploiting computational power, and we are not able to accept a
mechanism that will react with a delay. In this paper, we will
propose an intrusion response mechanism that is based on artificial
intelligence, and more precisely, reinforcement learning techniques
(RLT). The RLT will help us to create a decision agent, who will
control the process of interacting with the undetermined environment.
The goal is to find an optimal policy, which will represent the
intrusion response, therefore, to solve the Reinforcement learning
problem, using a Q-learning approach. Our agent will produce an
optimal immediate response, in the process of evaluating the network
traffic.This Q-learning approach will establish the balance between
exploration and exploitation and provide a unique, self-learning and
strategic artificial intelligence response mechanism for IDS.
Abstract: This paper presents an approach for optimal cyber security decisions to protect instances of a federated Internet of Things (IoT) platform in the cloud. The presented solution implements the repeated Stackelberg Security Game (SSG) and a model called Stochastic Human behaviour model with AttRactiveness and Probability weighting (SHARP). SHARP employs the Subjective Utility Quantal Response (SUQR) for formulating a subjective utility function, which is based on the evaluations of alternative solutions during decision-making. We augment the repeated SSG (including SHARP and SUQR) with a reinforced learning algorithm called Naïve Q-Learning. Naïve Q-Learning belongs to the category of active and model-free Machine Learning (ML) techniques in which the agent (either the defender or the attacker) attempts to find an optimal security solution. In this way, we combine GT and ML algorithms for discovering optimal cyber security policies. The proposed security optimization components will be validated in a collaborative cloud platform that is based on the Industrial Internet Reference Architecture (IIRA) and its recently published security model.
Abstract: Economic Dispatch is one of the most important power system management tools. It is used to allocate an amount of power generation to the generating units to meet the load demand. The Economic Dispatch problem is a large scale nonlinear constrained optimization problem. In general, heuristic optimization techniques are used to solve non-convex Economic Dispatch problem. In this paper, ideas from Reinforcement Learning are proposed to solve the non-convex Economic Dispatch problem. Q-Learning is a reinforcement learning techniques where each generating unit learn the optimal schedule of the generated power that minimizes the generation cost function. The eligibility traces are used to speed up the Q-Learning process. Q-Learning with eligibility traces is used to solve Economic Dispatch problems with valve point loading effect, multiple fuel options, and power transmission losses.
Abstract: This paper presents a new problem solving approach
that is able to generate optimal policy solution for finite-state
stochastic sequential decision-making problems with high data
efficiency. The proposed algorithm iteratively builds and improves
an approximate Markov Decision Process (MDP) model along with
cost-to-go value approximates by generating finite length trajectories
through the state-space. The approach creates a synergy between an
approximate evolving model and approximate cost-to-go values to
produce a sequence of improving policies finally converging to the
optimal policy through an intelligent and structured search of the
policy space. The approach modifies the policy update step of the
policy iteration so as to result in a speedy and stable convergence to
the optimal policy. We apply the algorithm to a non-holonomic
mobile robot control problem and compare its performance with
other Reinforcement Learning (RL) approaches, e.g., a) Q-learning,
b) Watkins Q(λ), c) SARSA(λ).
Abstract: In this work a visual and reactive contour following
behaviour is learned by reinforcement. With artificial vision the
environment is perceived in 3D, and it is possible to avoid obstacles
that are invisible to other sensors that are more common in mobile
robotics. Reinforcement learning reduces the need for intervention in
behaviour design, and simplifies its adjustment to the environment,
the robot and the task. In order to facilitate its generalisation to other
behaviours and to reduce the role of the designer, we propose a
regular image-based codification of states. Even though this is much
more difficult, our implementation converges and is robust. Results
are presented with a Pioneer 2 AT on a Gazebo 3D simulator.
Abstract: This paper employs a new approach to regulate the
blood glucose level of type I diabetic patient under an intensive
insulin treatment. The closed-loop control scheme incorporates
expert knowledge about treatment by using reinforcement learning
theory to maintain the normoglycemic average of 80 mg/dl and the
normal condition for free plasma insulin concentration in severe
initial state. The insulin delivery rate is obtained off-line by using Qlearning
algorithm, without requiring an explicit model of the
environment dynamics. The implementation of the insulin delivery
rate, therefore, requires simple function evaluation and minimal
online computations. Controller performance is assessed in terms of
its ability to reject the effect of meal disturbance and to overcome the
variability in the glucose-insulin dynamics from patient to patient.
Computer simulations are used to evaluate the effectiveness of the
proposed technique and to show its superiority in controlling
hyperglycemia over other existing algorithms