Markov games are a generalization of Markov
decision process to a multi-agent setting. Two-player zero-sum
Markov game framework offers an effective platform for designing
robust controllers. This paper presents two novel controller design
algorithms that use ideas from game-theory literature to produce
reliable controllers that are able to maintain performance in presence
of noise and parameter variations. A more widely used approach for
controller design is the H∞ optimal control, which suffers from high
computational demand and at times, may be infeasible. Our approach
generates an optimal control policy for the agent (controller) via a
simple Linear Program enabling the controller to learn about the
unknown environment. The controller is facing an unknown
environment, and in our formulation this environment corresponds to
the behavior rules of the noise modeled as the opponent. Proposed
controller architectures attempt to improve controller reliability by a
gradual mixing of algorithmic approaches drawn from the game
theory literature and the Minimax-Q Markov game solution
approach, in a reinforcement-learning framework. We test the
proposed algorithms on a simulated Inverted Pendulum Swing-up
task and compare its performance against standard Q learning.
[1] M.L.Littman, "Markov Games as a framework for Multi-agent
Reinforcement Learning", Proc. of Eleventh International Conference
on Machine Learning, Morgan Kaufman, pp. 157-163, 1994.
[2] K. Zhou, J.C. Doyle and K. Glower, Robust and Optimal Control,
Prentice Hall, New Jersey, 1996.
[3] M. D. S. Aliyu, "Adaptive Solution of Hamilton-Jacobi-Isaac Equation
and H∞ Stabilization of non- linear systems", Proceedings of the 2000
IEEE International Conference on Control Applications, Anchorage,
Alaska, USA, September 25-27, pp. 343-348, 2000.
[4] D. Michie and R.A. Chambers, "BOXES: An Experiment Adaptive
Control", Machine Intelligence 2, Edinburgh, Oliver and Byod, pp. 137-
152, 1968.
[5] G. Strang, Linear Algebra and its applications, Second Edition,
Academic Press, Orlando, Florida, 1980.
[6] D. Fudenberg and K. Levine, The Theory of Learning in Games, MIT
Press, 1998.
[7] L.C. Baird and H. Klopf, "Reinforcement Learning with High-
Dimensional Continuous Actions", Tech. Rep. WL-TR-93-1147, Wright
Laboratory, Wright-Patterson Air Force Base, OH 45433-7301.
[8] D.P. Bertsekas and J.N. Tsitsiklis, Neurodynamic-Programming, Athena
Scientific, Belmont MA, 1996.
[9] E. Altman and A. Hordijk , " Zero-sum Markov games and worst- case
optimal control of queueing systems", Invited paper, QUESTA , Vol. 21,
Special issue on optimization of queueing systems, pp. 415-447, 1995.
[10] K. Miyasawa, "On the convergence of learning process in 2x2 non zero
person game", Research memo 33, Princeton University, 1961.
[11] D. Fudenberg and K.D. Levine, " Consistency and Cautious Fictitious
Play", Journal of Economic Dynamics and Control, Elsevier Science,
Volume 19, Issue 5-7, pp. 1065-1090, 1995.
[12] D. Liu, X. Xiong, and Y. Zhang, "Action-Dependent Adaptive Critic
Designs", Proc. of Int. Joint Conf. on NN, Volume: 2, 15-19, July 2001,
pp. 990 - 995.
[13] G. Owen, Game Theory, 2nd Ed., Academic Press, Orlando, Florida,
1982.
[14] C. J. C. H. Watkins, " Learning with Delayed rewards", Ph. D.
Dissertation, Cambridge University, 1989.
[15] Matthias Heger, " Consideration of risk in reinforcement learning",
Proc. of 11th Int. Conf. on Machine Learning, Morgan Kaufmann
Publishers, San Francisco, CA, 1994, pp. 105-111.
[16] R. S. Sutton, A. G. Barto, and R.J. Williams, "Reinforcement learning is
direct adaptive optimal control", IEEE Control Systems Magazine,
Volume 12(2), pp. 19-22, 1992.
[1] M.L.Littman, "Markov Games as a framework for Multi-agent
Reinforcement Learning", Proc. of Eleventh International Conference
on Machine Learning, Morgan Kaufman, pp. 157-163, 1994.
[2] K. Zhou, J.C. Doyle and K. Glower, Robust and Optimal Control,
Prentice Hall, New Jersey, 1996.
[3] M. D. S. Aliyu, "Adaptive Solution of Hamilton-Jacobi-Isaac Equation
and H∞ Stabilization of non- linear systems", Proceedings of the 2000
IEEE International Conference on Control Applications, Anchorage,
Alaska, USA, September 25-27, pp. 343-348, 2000.
[4] D. Michie and R.A. Chambers, "BOXES: An Experiment Adaptive
Control", Machine Intelligence 2, Edinburgh, Oliver and Byod, pp. 137-
152, 1968.
[5] G. Strang, Linear Algebra and its applications, Second Edition,
Academic Press, Orlando, Florida, 1980.
[6] D. Fudenberg and K. Levine, The Theory of Learning in Games, MIT
Press, 1998.
[7] L.C. Baird and H. Klopf, "Reinforcement Learning with High-
Dimensional Continuous Actions", Tech. Rep. WL-TR-93-1147, Wright
Laboratory, Wright-Patterson Air Force Base, OH 45433-7301.
[8] D.P. Bertsekas and J.N. Tsitsiklis, Neurodynamic-Programming, Athena
Scientific, Belmont MA, 1996.
[9] E. Altman and A. Hordijk , " Zero-sum Markov games and worst- case
optimal control of queueing systems", Invited paper, QUESTA , Vol. 21,
Special issue on optimization of queueing systems, pp. 415-447, 1995.
[10] K. Miyasawa, "On the convergence of learning process in 2x2 non zero
person game", Research memo 33, Princeton University, 1961.
[11] D. Fudenberg and K.D. Levine, " Consistency and Cautious Fictitious
Play", Journal of Economic Dynamics and Control, Elsevier Science,
Volume 19, Issue 5-7, pp. 1065-1090, 1995.
[12] D. Liu, X. Xiong, and Y. Zhang, "Action-Dependent Adaptive Critic
Designs", Proc. of Int. Joint Conf. on NN, Volume: 2, 15-19, July 2001,
pp. 990 - 995.
[13] G. Owen, Game Theory, 2nd Ed., Academic Press, Orlando, Florida,
1982.
[14] C. J. C. H. Watkins, " Learning with Delayed rewards", Ph. D.
Dissertation, Cambridge University, 1989.
[15] Matthias Heger, " Consideration of risk in reinforcement learning",
Proc. of 11th Int. Conf. on Machine Learning, Morgan Kaufmann
Publishers, San Francisco, CA, 1994, pp. 105-111.
[16] R. S. Sutton, A. G. Barto, and R.J. Williams, "Reinforcement learning is
direct adaptive optimal control", IEEE Control Systems Magazine,
Volume 12(2), pp. 19-22, 1992.
@article{"International Journal of Electrical, Electronic and Communication Sciences:57904", author = "Rajneesh Sharma and M. Gopal", title = "Markov Game Controller Design Algorithms", abstract = "Markov games are a generalization of Markov
decision process to a multi-agent setting. Two-player zero-sum
Markov game framework offers an effective platform for designing
robust controllers. This paper presents two novel controller design
algorithms that use ideas from game-theory literature to produce
reliable controllers that are able to maintain performance in presence
of noise and parameter variations. A more widely used approach for
controller design is the H∞ optimal control, which suffers from high
computational demand and at times, may be infeasible. Our approach
generates an optimal control policy for the agent (controller) via a
simple Linear Program enabling the controller to learn about the
unknown environment. The controller is facing an unknown
environment, and in our formulation this environment corresponds to
the behavior rules of the noise modeled as the opponent. Proposed
controller architectures attempt to improve controller reliability by a
gradual mixing of algorithmic approaches drawn from the game
theory literature and the Minimax-Q Markov game solution
approach, in a reinforcement-learning framework. We test the
proposed algorithms on a simulated Inverted Pendulum Swing-up
task and compare its performance against standard Q learning.", keywords = "Reinforcement learning, Markov Decision Process,Matrix Games, Markov Games, Smooth Fictitious play, Controller,Inverted Pendulum.", volume = "1", number = "10", pages = "1485-9", }