Abstract: Deep reinforcement learning (deep RL) algorithms leverage the symbolic power of complex controllers by automating it by mapping sensory inputs to low-level actions. Deep RL eliminates the complex robot dynamics with minimal engineering. Deep RL provides high-risk involvement by directly implementing it in real-world scenarios and also high sensitivity towards hyperparameters. Tuning of hyperparameters on a pneumatic quadruped robot becomes very expensive through trial-and-error learning. This paper presents an automated learning control for a pneumatic quadruped robot using sample efficient deep Q learning, enabling minimal tuning and very few trials to learn the neural network. Long training hours may degrade the pneumatic cylinder due to jerk actions originated through stochastic weights. We applied this method to the pneumatic quadruped robot, which resulted in a hopping gait. In our process, we eliminated the use of a simulator and acquired a stable gait. This approach evolves so that the resultant gait matures more sturdy towards any stochastic changes in the environment. We further show that our algorithm performed very well as compared to programmed gait using robot dynamics.
Abstract: We present a decision-support tool to assist an operator in the detection and tracking of a suspect vehicle traveling to an unknown target destination. Multiple data sources, such as traffic cameras, traffic information, weather, etc., are integrated and processed in real-time to infer a suspect’s intended destination chosen from a list of pre-determined high-value targets. Previously, we presented our work in the detection and tracking of vehicles using traffic and airborne cameras. Here, we focus on the fusion and processing of that information to predict a suspect’s behavior. The network of cameras is represented by a directional graph, where the edges correspond to direct road connections between the nodes and the edge weights are proportional to the average time it takes to travel from one node to another. For our experiments, we construct our graph based on the greater Los Angeles subset of the Caltrans’s “Performance Measurement System” (PeMS) dataset. We propose a Bayesian approach where a posterior probability for each target is continuously updated based on detections of the suspect in the live video feeds. Additionally, we introduce the concept of ‘soft interventions’, inspired by the field of Causal Inference. Soft interventions are herein defined as interventions that do not immediately interfere with the suspect’s movements; rather, a soft intervention may induce the suspect into making a new decision, ultimately making their intent more transparent. For example, a soft intervention could be temporarily closing a road a few blocks from the suspect’s current location, which may require the suspect to change their current course. The objective of these interventions is to gain the maximum amount of information about the suspect’s intent in the shortest possible time. Our system currently operates in a human-on-the-loop mode where at each step, a set of recommendations are presented to the operator to aid in decision-making. In principle, the system could operate autonomously, only prompting the operator for critical decisions, allowing the system to significantly scale up to larger areas and multiple suspects. Once the intended target is identified with sufficient confidence, the vehicle is reported to the authorities to take further action. Other recommendations include a selection of road closures, i.e., soft interventions, or to continue monitoring. We evaluate the performance of the proposed system using simulated scenarios where the suspect, starting at random locations, takes a noisy shortest path to their intended target. In all scenarios, the suspect’s intended target is unknown to our system. The decision thresholds are selected to maximize the chances of determining the suspect’s intended target in the minimum amount of time and with the smallest number of interventions. We conclude by discussing the limitations of our current approach to motivate a machine learning approach, based on reinforcement learning in order to relax some of the current limiting assumptions.
Abstract: Maneuver decision-making plays a critical role in high-performance intelligent driving. This paper proposes a risk assessment-based decision-making network (RADMN) to address the problem of driving strategy for the commercial vehicle. RADMN integrates two networks, aiming at identifying the risk degree of collision and rollover and providing decisions to ensure the effectiveness and reliability of driving strategy. In the risk assessment module, risk degrees of the backward collision, forward collision and rollover are quantified for hazard recognition. In the decision module, a deep reinforcement learning based on multi-objective optimization (DRL-MOO) algorithm is designed, which comprehensively considers the risk degree and motion states of each traffic participant. To evaluate the performance of the proposed framework, Prescan/Simulink joint simulation was conducted in highway scenarios. Experimental results validate the effectiveness and reliability of the proposed RADMN. The output driving strategy can guarantee the safety and provide key technical support for the realization of autonomous driving of commercial vehicles.
Abstract: Nowadays, dialogue systems increasingly become the
way for humans to access many computer systems. So, humans
can interact with computers in natural language. A dialogue
system consists of three parts: understanding what humans say in
natural language, managing dialogue, and generating responses in
natural language. In this paper, we survey deep learning based
methods for dialogue management, response generation and dialogue
evaluation. Specifically, these methods are based on neural network,
long short-term memory network, deep reinforcement learning,
pre-training and generative adversarial network. We compare these
methods and point out the further research directions.
Abstract: Current trends in remote health monitoring to monetize on the Internet of Things applications have been raised in efficient and interference free communications in Wireless Body Area Network (WBAN) scenario. Co-existence interference in WBANs have aggravates the over-congested radio bands, thereby requiring efficient Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA) strategies and improve interference management. Existing solutions utilize simplistic heuristics to approach interference problems. The scope of this research article is to investigate reinforcement learning for efficient interference management under co-existing scenarios with an emphasis on homogenous interferences. The aim of this paper is to suggest a smart CSMA/CA mechanism based on reinforcement learning called QIM-MAC that effectively uses sense slots with minimal interference. Simulation results are analyzed based on scenarios which show that the proposed approach maximized Average Network Throughput and Packet Delivery Ratio and minimized Packet Loss Ratio, Energy Consumption and Average Delay.
Abstract: With the increasing dependency on our computer
devices, we face the necessity of adequate, efficient and effective
mechanisms, for protecting our network. There are two main
problems that Intrusion Detection Systems (IDS) attempt to solve.
1) To detect the attack, by analyzing the incoming traffic and inspect
the network (intrusion detection). 2) To produce a prompt response
when the attack occurs (intrusion prevention). It is critical creating an
Intrusion detection model that will detect a breach in the system on
time and also challenging making it provide an automatic and with
an acceptable delay response at every single stage of the monitoring
process. We cannot afford to adopt security measures with a high
exploiting computational power, and we are not able to accept a
mechanism that will react with a delay. In this paper, we will
propose an intrusion response mechanism that is based on artificial
intelligence, and more precisely, reinforcement learning techniques
(RLT). The RLT will help us to create a decision agent, who will
control the process of interacting with the undetermined environment.
The goal is to find an optimal policy, which will represent the
intrusion response, therefore, to solve the Reinforcement learning
problem, using a Q-learning approach. Our agent will produce an
optimal immediate response, in the process of evaluating the network
traffic.This Q-learning approach will establish the balance between
exploration and exploitation and provide a unique, self-learning and
strategic artificial intelligence response mechanism for IDS.
Abstract: In this paper, we study a distributed control algorithm
for the problem of unknown area coverage by a network of robots.
The coverage objective is to locate a set of targets in the area and
to minimize the robots’ energy consumption. The robots have no
prior knowledge about the location and also about the number of the
targets in the area. One efficient approach that can be used to relax
the robots’ lack of knowledge is to incorporate an auxiliary learning
algorithm into the control scheme. A learning algorithm actually
allows the robots to explore and study the unknown environment
and to eventually overcome their lack of knowledge. The control
algorithm itself is modeled based on game theory where the network
of the robots use their collective information to play a non-cooperative
potential game. The algorithm is tested via simulations to verify its
performance and adaptability.
Abstract: Relief demand and transportation links availability is the essential information that is needed for every natural disaster operation. This information is not in hand once a disaster strikes. Relief demand and network condition has been evaluated based on prediction method in related works. Nevertheless, prediction seems to be over or under estimated due to uncertainties and may lead to a failure operation. Therefore, in this paper a stochastic programming model is proposed to evaluate real-time relief demand and network condition at the onset of a natural disaster. To address the time sensitivity of the emergency response, the proposed model uses reinforcement learning for optimization of the total relief assessment time. The proposed model is tested on a real size network problem. The simulation results indicate that the proposed model performs well in the case of collecting real-time information.
Abstract: In this paper, a hybrid expert system is developed by using fuzzy genetic network programming with reinforcement learning (GNP-RL). In this system, the frame-based structure of the system uses the trading rules extracted by GNP. These rules are extracted by using technical indices of the stock prices in the training time period. For developing this system, we applied fuzzy node transition and decision making in both processing and judgment nodes of GNP-RL. Consequently, using these method not only did increase the accuracy of node transition and decision making in GNP's nodes, but also extended the GNP's binary signals to ternary trading signals. In the other words, in our proposed Fuzzy GNP-RL model, a No Trade signal is added to conventional Buy or Sell signals. Finally, the obtained rules are used in a frame-based system implemented in Kappa-PC software. This developed trading system has been used to generate trading signals for ten companies listed in Tehran Stock Exchange (TSE). The simulation results in the testing time period shows that the developed system has more favorable performance in comparison with the Buy and Hold strategy.
Abstract: In this paper, we provided a literature survey on the
artificial stock problem (ASM). The paper began by exploring the
complexity of the stock market and the needs for ASM. ASM
aims to investigate the link between individual behaviors (micro
level) and financial market dynamics (macro level). The variety of
patterns at the macro level is a function of the AFM complexity. The
financial market system is a complex system where the relationship
between the micro and macro level cannot be captured analytically.
Computational approaches, such as simulation, are expected to
comprehend this connection. Agent-based simulation is a simulation
technique commonly used to build AFMs. The paper proceeds by
discussing the components of the ASM. We consider the roles
of behavioral finance (BF) alongside the traditionally risk-averse
assumption in the construction of agent’s attributes. Also, the
influence of social networks in the developing of agents interactions is
addressed. Network topologies such as a small world, distance-based,
and scale-free networks may be utilized to outline economic
collaborations. In addition, the primary methods for developing
agents learning and adaptive abilities have been summarized.
These incorporated approach such as Genetic Algorithm, Genetic
Programming, Artificial neural network and Reinforcement Learning.
In addition, the most common statistical properties (the stylized facts)
of stock that are used for calibration and validation of ASM are
discussed. Besides, we have reviewed the major related previous
studies and categorize the utilized approaches as a part of these
studies. Finally, research directions and potential research questions
are argued. The research directions of ASM may focus on the macro
level by analyzing the market dynamic or on the micro level by
investigating the wealth distributions of the agents.
Abstract: Economic Dispatch is one of the most important power system management tools. It is used to allocate an amount of power generation to the generating units to meet the load demand. The Economic Dispatch problem is a large scale nonlinear constrained optimization problem. In general, heuristic optimization techniques are used to solve non-convex Economic Dispatch problem. In this paper, ideas from Reinforcement Learning are proposed to solve the non-convex Economic Dispatch problem. Q-Learning is a reinforcement learning techniques where each generating unit learn the optimal schedule of the generated power that minimizes the generation cost function. The eligibility traces are used to speed up the Q-Learning process. Q-Learning with eligibility traces is used to solve Economic Dispatch problems with valve point loading effect, multiple fuel options, and power transmission losses.
Abstract: Generalization is one of the most challenging issues
of Learning Classifier Systems. This feature depends on the
representation method which the system used. Considering the
proposed representation schemes for Learning Classifier System, it
can be concluded that many of them are designed to describe the
shape of the region which the environmental states belong and the
other relations of the environmental state with that region was
ignored. In this paper, we propose a new representation scheme
which is designed to show various relationships between the
environmental state and the region that is specified with a particular
classifier.
Abstract: This paper presents a new problem solving approach
that is able to generate optimal policy solution for finite-state
stochastic sequential decision-making problems with high data
efficiency. The proposed algorithm iteratively builds and improves
an approximate Markov Decision Process (MDP) model along with
cost-to-go value approximates by generating finite length trajectories
through the state-space. The approach creates a synergy between an
approximate evolving model and approximate cost-to-go values to
produce a sequence of improving policies finally converging to the
optimal policy through an intelligent and structured search of the
policy space. The approach modifies the policy update step of the
policy iteration so as to result in a speedy and stable convergence to
the optimal policy. We apply the algorithm to a non-holonomic
mobile robot control problem and compare its performance with
other Reinforcement Learning (RL) approaches, e.g., a) Q-learning,
b) Watkins Q(λ), c) SARSA(λ).
Abstract: Investigating language acquisition is one of the most
challenging problems in the area of studying language. Syllable
learning as a level of language acquisition has a considerable
significance since it plays an important role in language acquisition.
Because of impossibility of studying language acquisition directly
with children, especially in its developmental phases, computer
models will be useful in examining language acquisition. In this
paper a computer model of early language learning for syllable
learning is proposed. It is guided by a conceptual model of syllable
learning which is named Directions Into Velocities of Articulators
model (DIVA). The computer model uses simple associational and
reinforcement learning rules within neural network architecture
which are inspired by neuroscience. Our simulation results verify the
ability of the proposed computer model in producing phonemes
during babbling and early speech. Also, it provides a framework for
examining the neural basis of language learning and communication
disorders.
Abstract: Modeling the behavior of the dialogue management in
the design of a spoken dialogue system using statistical methodologies
is currently a growing research area. This paper presents a work
on developing an adaptive learning approach to optimize dialogue
strategy. At the core of our system is a method formalizing dialogue
management as a sequential decision making under uncertainty whose
underlying probabilistic structure has a Markov Chain. Researchers
have mostly focused on model-free algorithms for automating the
design of dialogue management using machine learning techniques
such as reinforcement learning. But in model-free algorithms there
exist a dilemma in engaging the type of exploration versus exploitation.
Hence we present a model-based online policy learning
algorithm using interconnected learning automata for optimizing
dialogue strategy. The proposed algorithm is capable of deriving
an optimal policy that prescribes what action should be taken in
various states of conversation so as to maximize the expected total
reward to attain the goal and incorporates good exploration and
exploitation in its updates to improve the naturalness of humancomputer
interaction. We test the proposed approach using the most
sophisticated evaluation framework PARADISE for accessing to the
railway information system.
Abstract: Conceptualization strengthens intelligent systems in generalization skill, effective knowledge representation, real-time inference, and managing uncertain and indefinite situations in addition to facilitating knowledge communication for learning agents situated in real world. Concept learning introduces a way of abstraction by which the continuous state is formed as entities called concepts which are connected to the action space and thus, they illustrate somehow the complex action space. Of computational concept learning approaches, action-based conceptualization is favored because of its simplicity and mirror neuron foundations in neuroscience. In this paper, a new biologically inspired concept learning approach based on the probabilistic framework is proposed. This approach exploits and extends the mirror neuron-s role in conceptualization for a reinforcement learning agent in nondeterministic environments. In the proposed method, instead of building a huge numerical knowledge, the concepts are learnt gradually from rewards through interaction with the environment. Moreover the probabilistic formation of the concepts is employed to deal with uncertain and dynamic nature of real problems in addition to the ability of generalization. These characteristics as a whole distinguish the proposed learning algorithm from both a pure classification algorithm and typical reinforcement learning. Simulation results show advantages of the proposed framework in terms of convergence speed as well as generalization and asymptotic behavior because of utilizing both success and failures attempts through received rewards. Experimental results, on the other hand, show the applicability and effectiveness of the proposed method in continuous and noisy environments for a real robotic task such as maze as well as the benefits of implementing an incremental learning scenario in artificial agents.
Abstract: In this work a visual and reactive contour following
behaviour is learned by reinforcement. With artificial vision the
environment is perceived in 3D, and it is possible to avoid obstacles
that are invisible to other sensors that are more common in mobile
robotics. Reinforcement learning reduces the need for intervention in
behaviour design, and simplifies its adjustment to the environment,
the robot and the task. In order to facilitate its generalisation to other
behaviours and to reduce the role of the designer, we propose a
regular image-based codification of states. Even though this is much
more difficult, our implementation converges and is robust. Results
are presented with a Pioneer 2 AT on a Gazebo 3D simulator.
Abstract: Markov games are a generalization of Markov
decision process to a multi-agent setting. Two-player zero-sum
Markov game framework offers an effective platform for designing
robust controllers. This paper presents two novel controller design
algorithms that use ideas from game-theory literature to produce
reliable controllers that are able to maintain performance in presence
of noise and parameter variations. A more widely used approach for
controller design is the H∞ optimal control, which suffers from high
computational demand and at times, may be infeasible. Our approach
generates an optimal control policy for the agent (controller) via a
simple Linear Program enabling the controller to learn about the
unknown environment. The controller is facing an unknown
environment, and in our formulation this environment corresponds to
the behavior rules of the noise modeled as the opponent. Proposed
controller architectures attempt to improve controller reliability by a
gradual mixing of algorithmic approaches drawn from the game
theory literature and the Minimax-Q Markov game solution
approach, in a reinforcement-learning framework. We test the
proposed algorithms on a simulated Inverted Pendulum Swing-up
task and compare its performance against standard Q learning.
Abstract: This research presents a system for post processing of
data that takes mined flat rules as input and discovers crisp as well as
fuzzy hierarchical structures using Learning Classifier System
approach. Learning Classifier System (LCS) is basically a machine
learning technique that combines evolutionary computing,
reinforcement learning, supervised or unsupervised learning and
heuristics to produce adaptive systems. A LCS learns by interacting
with an environment from which it receives feedback in the form of
numerical reward. Learning is achieved by trying to maximize the
amount of reward received. Crisp description for a concept usually
cannot represent human knowledge completely and practically. In the
proposed Learning Classifier System initial population is constructed
as a random collection of HPR–trees (related production rules) and
crisp / fuzzy hierarchies are evolved. A fuzzy subsumption relation is
suggested for the proposed system and based on Subsumption Matrix
(SM), a suitable fitness function is proposed. Suitable genetic
operators are proposed for the chosen chromosome representation
method. For implementing reinforcement a suitable reward and
punishment scheme is also proposed. Experimental results are
presented to demonstrate the performance of the proposed system.
Abstract: How to coordinate the behaviors of the agents through
learning is a challenging problem within multi-agent domains.
Because of its complexity, recent work has focused on how
coordinated strategies can be learned. Here we are interested in using
reinforcement learning techniques to learn the coordinated actions of a
group of agents, without requiring explicit communication among
them. However, traditional reinforcement learning methods are based
on the assumption that the environment can be modeled as Markov
Decision Process, which usually cannot be satisfied when multiple
agents coexist in the same environment. Moreover, to effectively
coordinate each agent-s behavior so as to achieve the goal, it-s
necessary to augment the state of each agent with the information
about other existing agents. Whereas, as the number of agents in a
multiagent environment increases, the state space of each agent grows
exponentially, which will cause the combinational explosion problem.
Profit sharing is one of the reinforcement learning methods that allow
agents to learn effective behaviors from their experiences even within
non-Markovian environments. In this paper, to remedy the drawback
of the original profit sharing approach that needs much memory to
store each state-action pair during the learning process, we firstly
address a kind of on-line rational profit sharing algorithm. Then, we
integrate the advantages of modular learning architecture with on-line
rational profit sharing algorithm, and propose a new modular
reinforcement learning model. The effectiveness of the technique is
demonstrated using the pursuit problem.