A Modular On-line Profit Sharing Approach in Multiagent Domains
How to coordinate the behaviors of the agents through
learning is a challenging problem within multi-agent domains.
Because of its complexity, recent work has focused on how
coordinated strategies can be learned. Here we are interested in using
reinforcement learning techniques to learn the coordinated actions of a
group of agents, without requiring explicit communication among
them. However, traditional reinforcement learning methods are based
on the assumption that the environment can be modeled as Markov
Decision Process, which usually cannot be satisfied when multiple
agents coexist in the same environment. Moreover, to effectively
coordinate each agent-s behavior so as to achieve the goal, it-s
necessary to augment the state of each agent with the information
about other existing agents. Whereas, as the number of agents in a
multiagent environment increases, the state space of each agent grows
exponentially, which will cause the combinational explosion problem.
Profit sharing is one of the reinforcement learning methods that allow
agents to learn effective behaviors from their experiences even within
non-Markovian environments. In this paper, to remedy the drawback
of the original profit sharing approach that needs much memory to
store each state-action pair during the learning process, we firstly
address a kind of on-line rational profit sharing algorithm. Then, we
integrate the advantages of modular learning architecture with on-line
rational profit sharing algorithm, and propose a new modular
reinforcement learning model. The effectiveness of the technique is
demonstrated using the pursuit problem.
[1] Panait L, Luke S, "Cooperative Multi-Agent Learning: The state of the
art." Autonomous Agents and Multi-Agent Systems, 2005, 11(3): 387-434
[2] Ho F, Kamel M. "Learning coordinating strategies for cooperative
multiagent systems." Machine Learning, 1998, 33(2-3): 155-177,
[3] Garland A, Alterman R. "Autonomous agents that learn to better
coordinate." Autonomous Agents and Multi-Agent System, 2004, 8(3):
267-301
[4] Kaelbing L P, Littman M L, Moore A W. "Reinforcement learning: A
survey." Journal of Artificial Research, 1996, 4: 237-285
[5] Sutton R S, Barto A G. Reinforcement learning: An introduction.
Cambridge, MA: MIT Press, 1998
[6] Excelente-Toledo CB, Jennings NR. "Using reinforcement learning to
coordinate better." Computational Intelligence, Vol. 21 No. 3, pp.
217-245. Blackwell Publishing 2005
[7] CHEN G, YANG ZH. "Coordinating Multiple Agents via Reinforcement
Learning." Autonomous Agents and Multi-Agent Systems, 2005, 10(3):
273-328
[8] Ono N, Fukumoto K. "Multi-agent reinforcement learning: A modular
approach." In Proceedings of the Second International Conference on
Multi-agent Systems. Portland, Oregon, USA. 1996, pp: 252-258, AAAI
Press
[9] Miyazaki K, Yamamura M, Kobayashi S. "On the rationality of profit
sharing in reinforcement learning." In Proceedings of the third
International Conference on Fuzzy Logic, Neural Nets and Soft
Computing, pages 285-288. Fuzzy Logic Systems Institute, 1994
[10] Arai S, Sycara K. "Effective learning approach for planning and
scheduling in multi-agent domain." In Proceedings of the 6th
International Conference on Simulation of Adaptive Behavior. Paris,
France. September 2000, pp: 507-516
[11] Arai S, Sycara K P, Payne T R. "Experience-based reinforcement
learning to acquire effective behavior in a multi-agent domain." In
Proceedings of the 6th Pacific Rim International Conference on
Artificial Intelligence. Melbourne, Australia. 2000, pp: 125-135
[12] Bellman R. Dynamic programming. Princeton, NJ: Princeton University
Press, 1957
[13] Watkins C J, Dayan P. "Technical Note: Q-learning." Machine learning,
1992, 8: 279-292
[14] Whitehead S, Karlsson J, Tenenberg J. "Learning multiple goal behavior
via task decomposition and dynamic policy merging." Robot Learning,
Norwell, MA: Kluwer Academic Press, 1993
[15] Grefenstette J J. "Credit assignment in rule discovery systems based on
genetic algorithms." Machine Learning, 1988, 3: 225-245
[16] Miyazaki K, Kobayashi S. "On the rationality of profit sharing in
partially observable markov decision process." In Proceedings of the
fifth International Conference on Information Systems Analysis and
Synthesis. Orlando, FL, USA. 1999, pp: 190-197
[17] Whitehead S D, Balland D H. Active perception and reinforcement
learning. In Proceedings of 7th International Conference on Machine
Learning. 1990, pp: 162-169
[18] Singh S P, Sutton R S. "Reinforcement learning with replacing eligibility
traces." Machine Learning, 1996, 22: 123-158
[19] Benda M, Jagannathan V, Dodhiawalla R. "On optimal cooperation of
knowledge source." Technical Report No. BCS-G2010-28, Boeing
Advanced Technology Center, Boeing Computer Services, Seattle, WA,
1986
[1] Panait L, Luke S, "Cooperative Multi-Agent Learning: The state of the
art." Autonomous Agents and Multi-Agent Systems, 2005, 11(3): 387-434
[2] Ho F, Kamel M. "Learning coordinating strategies for cooperative
multiagent systems." Machine Learning, 1998, 33(2-3): 155-177,
[3] Garland A, Alterman R. "Autonomous agents that learn to better
coordinate." Autonomous Agents and Multi-Agent System, 2004, 8(3):
267-301
[4] Kaelbing L P, Littman M L, Moore A W. "Reinforcement learning: A
survey." Journal of Artificial Research, 1996, 4: 237-285
[5] Sutton R S, Barto A G. Reinforcement learning: An introduction.
Cambridge, MA: MIT Press, 1998
[6] Excelente-Toledo CB, Jennings NR. "Using reinforcement learning to
coordinate better." Computational Intelligence, Vol. 21 No. 3, pp.
217-245. Blackwell Publishing 2005
[7] CHEN G, YANG ZH. "Coordinating Multiple Agents via Reinforcement
Learning." Autonomous Agents and Multi-Agent Systems, 2005, 10(3):
273-328
[8] Ono N, Fukumoto K. "Multi-agent reinforcement learning: A modular
approach." In Proceedings of the Second International Conference on
Multi-agent Systems. Portland, Oregon, USA. 1996, pp: 252-258, AAAI
Press
[9] Miyazaki K, Yamamura M, Kobayashi S. "On the rationality of profit
sharing in reinforcement learning." In Proceedings of the third
International Conference on Fuzzy Logic, Neural Nets and Soft
Computing, pages 285-288. Fuzzy Logic Systems Institute, 1994
[10] Arai S, Sycara K. "Effective learning approach for planning and
scheduling in multi-agent domain." In Proceedings of the 6th
International Conference on Simulation of Adaptive Behavior. Paris,
France. September 2000, pp: 507-516
[11] Arai S, Sycara K P, Payne T R. "Experience-based reinforcement
learning to acquire effective behavior in a multi-agent domain." In
Proceedings of the 6th Pacific Rim International Conference on
Artificial Intelligence. Melbourne, Australia. 2000, pp: 125-135
[12] Bellman R. Dynamic programming. Princeton, NJ: Princeton University
Press, 1957
[13] Watkins C J, Dayan P. "Technical Note: Q-learning." Machine learning,
1992, 8: 279-292
[14] Whitehead S, Karlsson J, Tenenberg J. "Learning multiple goal behavior
via task decomposition and dynamic policy merging." Robot Learning,
Norwell, MA: Kluwer Academic Press, 1993
[15] Grefenstette J J. "Credit assignment in rule discovery systems based on
genetic algorithms." Machine Learning, 1988, 3: 225-245
[16] Miyazaki K, Kobayashi S. "On the rationality of profit sharing in
partially observable markov decision process." In Proceedings of the
fifth International Conference on Information Systems Analysis and
Synthesis. Orlando, FL, USA. 1999, pp: 190-197
[17] Whitehead S D, Balland D H. Active perception and reinforcement
learning. In Proceedings of 7th International Conference on Machine
Learning. 1990, pp: 162-169
[18] Singh S P, Sutton R S. "Reinforcement learning with replacing eligibility
traces." Machine Learning, 1996, 22: 123-158
[19] Benda M, Jagannathan V, Dodhiawalla R. "On optimal cooperation of
knowledge source." Technical Report No. BCS-G2010-28, Boeing
Advanced Technology Center, Boeing Computer Services, Seattle, WA,
1986
@article{"International Journal of Information, Control and Computer Sciences:56579", author = "Pucheng Zhou and Bingrong Hong", title = "A Modular On-line Profit Sharing Approach in Multiagent Domains", abstract = "How to coordinate the behaviors of the agents through
learning is a challenging problem within multi-agent domains.
Because of its complexity, recent work has focused on how
coordinated strategies can be learned. Here we are interested in using
reinforcement learning techniques to learn the coordinated actions of a
group of agents, without requiring explicit communication among
them. However, traditional reinforcement learning methods are based
on the assumption that the environment can be modeled as Markov
Decision Process, which usually cannot be satisfied when multiple
agents coexist in the same environment. Moreover, to effectively
coordinate each agent-s behavior so as to achieve the goal, it-s
necessary to augment the state of each agent with the information
about other existing agents. Whereas, as the number of agents in a
multiagent environment increases, the state space of each agent grows
exponentially, which will cause the combinational explosion problem.
Profit sharing is one of the reinforcement learning methods that allow
agents to learn effective behaviors from their experiences even within
non-Markovian environments. In this paper, to remedy the drawback
of the original profit sharing approach that needs much memory to
store each state-action pair during the learning process, we firstly
address a kind of on-line rational profit sharing algorithm. Then, we
integrate the advantages of modular learning architecture with on-line
rational profit sharing algorithm, and propose a new modular
reinforcement learning model. The effectiveness of the technique is
demonstrated using the pursuit problem.", keywords = "Multi-agent learning; reinforcement learning; rationalprofit sharing; modular architecture.", volume = "2", number = "4", pages = "1160-7", }