A Cognitive Robot Collaborative Reinforcement Learning Algorithm
A cognitive collaborative reinforcement learning
algorithm (CCRL) that incorporates an advisor into the learning
process is developed to improve supervised learning. An autonomous
learner is enabled with a self awareness cognitive skill to decide
when to solicit instructions from the advisor. The learner can also
assess the value of advice, and accept or reject it. The method is
evaluated for robotic motion planning using simulation. Tests are
conducted for advisors with skill levels from expert to novice. The
CCRL algorithm and a combined method integrating its logic with
Clouse-s Introspection Approach, outperformed a base-line fully
autonomous learner, and demonstrated robust performance when
dealing with various advisor skill levels, learning to accept advice
received from an expert, while rejecting that of less skilled
collaborators. Although the CCRL algorithm is based on RL, it fits
other machine learning methods, since advisor-s actions are only
added to the outer layer.
[1] C. J. C. H. Watkins, "Learning from Delayed Rewards," Ph.D.
dissertation, Psychology Dept., Cambridge University, 1989.
[2] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction,
Cambridge, MA: MIT Press, 1998.
[3] T. G. Dietterich, "Hierarchical reinforcement learning with the maxq
value function decomposition," Journal of Artificial Intelligence
Research, 1999, vol. 13, pp. 227-303.
[4] V. N. Papudesi and M. Huber, "Learning from reinforcement and advice
using composite reward functions," in Proc. 16th Int. FLAIRS Conf., pp.
361-365, St. Augustine, FL, 2003.
[5] L. Mihalkova and R. Mooney, "Using active relocation to aid
reinforcement," in Proc. 19th Int. FLAIRS Conf., Florida, 2006,
[6] U. Kartoun, H. Stern, and Y. Edan, "Human-robot collaborative learning
system for inspection," IEEE Int. Conf. on Systems, Man, and
Cybernetics, pp. 4249-4255, Taipei, Taiwan, 2006.
[7] V. U. Cetina, "Supervised Reinforcement Learning Using Behavior
Models," IEEE Computer Society 6th Int. Conf. on Machine Learning and
Applications, Cincinnati, Ohio, USA, 2007.
[8] C. Breazeal and A, Thomaz, "Learning from Human Teachers with
Socially Guided Exploration," IEEE Int. Conf. on Robotics and
Automation, Pasadena, CA, USA, 2008.
[9] J. A. Clouse, "An introspection approach to querying a trainer," technical
report 96-13, University of Massachusetts, Amherst, MA, 1996.
[10] M. A. Goodrich, R. D. R. Olsen, J. W. Crandall and T. J. Palmer,
Experiments in adjustable autonomy," in Proceedings of the IJCAI
Workshop on Autonomy, Delegation and Control: Interacting with
Intelligent Agents, 2001, pp. 1624-1629.
[1] C. J. C. H. Watkins, "Learning from Delayed Rewards," Ph.D.
dissertation, Psychology Dept., Cambridge University, 1989.
[2] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction,
Cambridge, MA: MIT Press, 1998.
[3] T. G. Dietterich, "Hierarchical reinforcement learning with the maxq
value function decomposition," Journal of Artificial Intelligence
Research, 1999, vol. 13, pp. 227-303.
[4] V. N. Papudesi and M. Huber, "Learning from reinforcement and advice
using composite reward functions," in Proc. 16th Int. FLAIRS Conf., pp.
361-365, St. Augustine, FL, 2003.
[5] L. Mihalkova and R. Mooney, "Using active relocation to aid
reinforcement," in Proc. 19th Int. FLAIRS Conf., Florida, 2006,
[6] U. Kartoun, H. Stern, and Y. Edan, "Human-robot collaborative learning
system for inspection," IEEE Int. Conf. on Systems, Man, and
Cybernetics, pp. 4249-4255, Taipei, Taiwan, 2006.
[7] V. U. Cetina, "Supervised Reinforcement Learning Using Behavior
Models," IEEE Computer Society 6th Int. Conf. on Machine Learning and
Applications, Cincinnati, Ohio, USA, 2007.
[8] C. Breazeal and A, Thomaz, "Learning from Human Teachers with
Socially Guided Exploration," IEEE Int. Conf. on Robotics and
Automation, Pasadena, CA, USA, 2008.
[9] J. A. Clouse, "An introspection approach to querying a trainer," technical
report 96-13, University of Massachusetts, Amherst, MA, 1996.
[10] M. A. Goodrich, R. D. R. Olsen, J. W. Crandall and T. J. Palmer,
Experiments in adjustable autonomy," in Proceedings of the IJCAI
Workshop on Autonomy, Delegation and Control: Interacting with
Intelligent Agents, 2001, pp. 1624-1629.
@article{"International Journal of Mechanical, Industrial and Aerospace Sciences:55473", author = "Amit Gil and Helman Stern and Yael Edan", title = "A Cognitive Robot Collaborative Reinforcement Learning Algorithm", abstract = "A cognitive collaborative reinforcement learning
algorithm (CCRL) that incorporates an advisor into the learning
process is developed to improve supervised learning. An autonomous
learner is enabled with a self awareness cognitive skill to decide
when to solicit instructions from the advisor. The learner can also
assess the value of advice, and accept or reject it. The method is
evaluated for robotic motion planning using simulation. Tests are
conducted for advisors with skill levels from expert to novice. The
CCRL algorithm and a combined method integrating its logic with
Clouse-s Introspection Approach, outperformed a base-line fully
autonomous learner, and demonstrated robust performance when
dealing with various advisor skill levels, learning to accept advice
received from an expert, while rejecting that of less skilled
collaborators. Although the CCRL algorithm is based on RL, it fits
other machine learning methods, since advisor-s actions are only
added to the outer layer.", keywords = "Robot learning, human-robot collaboration, motion
planning, reinforcement learning.", volume = "3", number = "5", pages = "513-8", }