Abstract: A cognitive collaborative reinforcement learning
algorithm (CCRL) that incorporates an advisor into the learning
process is developed to improve supervised learning. An autonomous
learner is enabled with a self awareness cognitive skill to decide
when to solicit instructions from the advisor. The learner can also
assess the value of advice, and accept or reject it. The method is
evaluated for robotic motion planning using simulation. Tests are
conducted for advisors with skill levels from expert to novice. The
CCRL algorithm and a combined method integrating its logic with
Clouse-s Introspection Approach, outperformed a base-line fully
autonomous learner, and demonstrated robust performance when
dealing with various advisor skill levels, learning to accept advice
received from an expert, while rejecting that of less skilled
collaborators. Although the CCRL algorithm is based on RL, it fits
other machine learning methods, since advisor-s actions are only
added to the outer layer.
Abstract: This paper employs a new approach to regulate the
blood glucose level of type I diabetic patient under an intensive
insulin treatment. The closed-loop control scheme incorporates
expert knowledge about treatment by using reinforcement learning
theory to maintain the normoglycemic average of 80 mg/dl and the
normal condition for free plasma insulin concentration in severe
initial state. The insulin delivery rate is obtained off-line by using Qlearning
algorithm, without requiring an explicit model of the
environment dynamics. The implementation of the insulin delivery
rate, therefore, requires simple function evaluation and minimal
online computations. Controller performance is assessed in terms of
its ability to reject the effect of meal disturbance and to overcome the
variability in the glucose-insulin dynamics from patient to patient.
Computer simulations are used to evaluate the effectiveness of the
proposed technique and to show its superiority in controlling
hyperglycemia over other existing algorithms
Abstract: A self tuning PID control strategy using reinforcement
learning is proposed in this paper to deal with the control of wind
energy conversion systems (WECS). Actor-Critic learning is used to
tune PID parameters in an adaptive way by taking advantage of the
model-free and on-line learning properties of reinforcement learning
effectively. In order to reduce the demand of storage space and to
improve the learning efficiency, a single RBF neural network is used
to approximate the policy function of Actor and the value function of
Critic simultaneously. The inputs of RBF network are the system
error, as well as the first and the second-order differences of error.
The Actor can realize the mapping from the system state to PID
parameters, while the Critic evaluates the outputs of the Actor and
produces TD error. Based on TD error performance index and
gradient descent method, the updating rules of RBF kernel function
and network weights were given. Simulation results show that the
proposed controller is efficient for WECS and it is perfectly
adaptable and strongly robust, which is better than that of a
conventional PID controller.
Abstract: In the recent past Learning Classifier Systems have
been successfully used for data mining. Learning Classifier System
(LCS) is basically a machine learning technique which combines
evolutionary computing, reinforcement learning, supervised or
unsupervised learning and heuristics to produce adaptive systems. A
LCS learns by interacting with an environment from which it
receives feedback in the form of numerical reward. Learning is
achieved by trying to maximize the amount of reward received. All
LCSs models more or less, comprise four main components; a finite
population of condition–action rules, called classifiers; the
performance component, which governs the interaction with the
environment; the credit assignment component, which distributes the
reward received from the environment to the classifiers accountable
for the rewards obtained; the discovery component, which is
responsible for discovering better rules and improving existing ones
through a genetic algorithm. The concatenate of the production rules
in the LCS form the genotype, and therefore the GA should operate
on a population of classifier systems. This approach is known as the
'Pittsburgh' Classifier Systems. Other LCS that perform their GA at
the rule level within a population are known as 'Mitchigan' Classifier
Systems. The most predominant representation of the discovered
knowledge is the standard production rules (PRs) in the form of IF P
THEN D. The PRs, however, are unable to handle exceptions and do
not exhibit variable precision. The Censored Production Rules
(CPRs), an extension of PRs, were proposed by Michalski and
Winston that exhibit variable precision and supports an efficient
mechanism for handling exceptions. A CPR is an augmented
production rule of the form: IF P THEN D UNLESS C, where
Censor C is an exception to the rule. Such rules are employed in
situations, in which conditional statement IF P THEN D holds
frequently and the assertion C holds rarely. By using a rule of this
type we are free to ignore the exception conditions, when the
resources needed to establish its presence are tight or there is simply
no information available as to whether it holds or not. Thus, the IF P
THEN D part of CPR expresses important information, while the
UNLESS C part acts only as a switch and changes the polarity of D
to ~D. In this paper Pittsburgh style LCSs approach is used for
automated discovery of CPRs. An appropriate encoding scheme is
suggested to represent a chromosome consisting of fixed size set of
CPRs. Suitable genetic operators are designed for the set of CPRs
and individual CPRs and also appropriate fitness function is proposed
that incorporates basic constraints on CPR. Experimental results are
presented to demonstrate the performance of the proposed learning
classifier system.
Abstract: Trust management and Reputation models are
becoming integral part of Internet based applications such as CSCW,
E-commerce and Grid Computing. Also the trust dimension is a
significant social structure and key to social relations within a
collaborative community. Collaborative Decision Making (CDM) is
a difficult task in the context of distributed environment (information
across different geographical locations) and multidisciplinary
decisions are involved such as Virtual Organization (VO). To aid
team decision making in VO, Decision Support System and social
network analysis approaches are integrated. In such situations social
learning helps an organization in terms of relationship, team
formation, partner selection etc. In this paper we focus on trust
learning. Trust learning is an important activity in terms of
information exchange, negotiation, collaboration and trust
assessment for cooperation among virtual team members. In this
paper we have proposed a reinforcement learning which enhances the
trust decision making capability of interacting agents during
collaboration in problem solving activity. Trust computational model
with learning that we present is adapted for best alternate selection of
new project in the organization. We verify our model in a multi-agent
simulation where the agents in the community learn to identify
trustworthy members, inconsistent behavior and conflicting behavior
of agents.
Abstract: This paper outlines the development of a learning retrieval agent. Task of this agent is to extract knowledge of the Active Semantic Network in respect to user-requests. Based on a reinforcement learning approach, the agent learns to interpret the user-s intention. Especially, the learning algorithm focuses on the retrieval of complex long distant relations. Increasing its learnt knowledge with every request-result-evaluation sequence, the agent enhances his capability in finding the intended information.
Abstract: A novel biologically inspired controller for the autonomous
navigation of a mobile robot in an evasion task is
proposed. The controller takes advantage of the environment by
calculating a measure of danger and subsequently choosing the
parameters of a reinforcement learning based decision process.
Two different reinforcement learning algorithms were used: Qlearning
and Sarsa (λ). Simulations show that selecting dynamic
parameters reduce the time while executing the decision making
process, so the robot can obtain a policy to succeed in an escaping
task in a realistic time.