A Modular On-line Profit Sharing Approach in Multiagent Domains

How to coordinate the behaviors of the agents through learning is a challenging problem within multi-agent domains. Because of its complexity, recent work has focused on how coordinated strategies can be learned. Here we are interested in using reinforcement learning techniques to learn the coordinated actions of a group of agents, without requiring explicit communication among them. However, traditional reinforcement learning methods are based on the assumption that the environment can be modeled as Markov Decision Process, which usually cannot be satisfied when multiple agents coexist in the same environment. Moreover, to effectively coordinate each agent-s behavior so as to achieve the goal, it-s necessary to augment the state of each agent with the information about other existing agents. Whereas, as the number of agents in a multiagent environment increases, the state space of each agent grows exponentially, which will cause the combinational explosion problem. Profit sharing is one of the reinforcement learning methods that allow agents to learn effective behaviors from their experiences even within non-Markovian environments. In this paper, to remedy the drawback of the original profit sharing approach that needs much memory to store each state-action pair during the learning process, we firstly address a kind of on-line rational profit sharing algorithm. Then, we integrate the advantages of modular learning architecture with on-line rational profit sharing algorithm, and propose a new modular reinforcement learning model. The effectiveness of the technique is demonstrated using the pursuit problem.




References:
[1] Panait L, Luke S, "Cooperative Multi-Agent Learning: The state of the
art." Autonomous Agents and Multi-Agent Systems, 2005, 11(3): 387-434
[2] Ho F, Kamel M. "Learning coordinating strategies for cooperative
multiagent systems." Machine Learning, 1998, 33(2-3): 155-177,
[3] Garland A, Alterman R. "Autonomous agents that learn to better
coordinate." Autonomous Agents and Multi-Agent System, 2004, 8(3):
267-301
[4] Kaelbing L P, Littman M L, Moore A W. "Reinforcement learning: A
survey." Journal of Artificial Research, 1996, 4: 237-285
[5] Sutton R S, Barto A G. Reinforcement learning: An introduction.
Cambridge, MA: MIT Press, 1998
[6] Excelente-Toledo CB, Jennings NR. "Using reinforcement learning to
coordinate better." Computational Intelligence, Vol. 21 No. 3, pp.
217-245. Blackwell Publishing 2005
[7] CHEN G, YANG ZH. "Coordinating Multiple Agents via Reinforcement
Learning." Autonomous Agents and Multi-Agent Systems, 2005, 10(3):
273-328
[8] Ono N, Fukumoto K. "Multi-agent reinforcement learning: A modular
approach." In Proceedings of the Second International Conference on
Multi-agent Systems. Portland, Oregon, USA. 1996, pp: 252-258, AAAI
Press
[9] Miyazaki K, Yamamura M, Kobayashi S. "On the rationality of profit
sharing in reinforcement learning." In Proceedings of the third
International Conference on Fuzzy Logic, Neural Nets and Soft
Computing, pages 285-288. Fuzzy Logic Systems Institute, 1994
[10] Arai S, Sycara K. "Effective learning approach for planning and
scheduling in multi-agent domain." In Proceedings of the 6th
International Conference on Simulation of Adaptive Behavior. Paris,
France. September 2000, pp: 507-516
[11] Arai S, Sycara K P, Payne T R. "Experience-based reinforcement
learning to acquire effective behavior in a multi-agent domain." In
Proceedings of the 6th Pacific Rim International Conference on
Artificial Intelligence. Melbourne, Australia. 2000, pp: 125-135
[12] Bellman R. Dynamic programming. Princeton, NJ: Princeton University
Press, 1957
[13] Watkins C J, Dayan P. "Technical Note: Q-learning." Machine learning,
1992, 8: 279-292
[14] Whitehead S, Karlsson J, Tenenberg J. "Learning multiple goal behavior
via task decomposition and dynamic policy merging." Robot Learning,
Norwell, MA: Kluwer Academic Press, 1993
[15] Grefenstette J J. "Credit assignment in rule discovery systems based on
genetic algorithms." Machine Learning, 1988, 3: 225-245
[16] Miyazaki K, Kobayashi S. "On the rationality of profit sharing in
partially observable markov decision process." In Proceedings of the
fifth International Conference on Information Systems Analysis and
Synthesis. Orlando, FL, USA. 1999, pp: 190-197
[17] Whitehead S D, Balland D H. Active perception and reinforcement
learning. In Proceedings of 7th International Conference on Machine
Learning. 1990, pp: 162-169
[18] Singh S P, Sutton R S. "Reinforcement learning with replacing eligibility
traces." Machine Learning, 1996, 22: 123-158
[19] Benda M, Jagannathan V, Dodhiawalla R. "On optimal cooperation of
knowledge source." Technical Report No. BCS-G2010-28, Boeing
Advanced Technology Center, Boeing Computer Services, Seattle, WA,
1986