Hybrid Markov Game Controller Design Algorithms for Nonlinear Systems
Markov games can be effectively used to design
controllers for nonlinear systems. The paper presents two novel
controller design algorithms by incorporating ideas from gametheory
literature that address safety and consistency issues of the
'learned' control strategy. A more widely used approach for
controller design is the H∞ optimal control, which suffers from high
computational demand and at times, may be infeasible. We generate
an optimal control policy for the agent (controller) via a simple
Linear Program enabling the controller to learn about the unknown
environment. The controller is facing an unknown environment and
in our formulation this environment corresponds to the behavior rules
of the noise modeled as the opponent. Proposed approaches aim to
achieve 'safe-consistent' and 'safe-universally consistent' controller
behavior by hybridizing 'min-max', 'fictitious play' and 'cautious
fictitious play' approaches drawn from game theory. We empirically
evaluate the approaches on a simulated Inverted Pendulum swing-up
task and compare its performance against standard Q learning.
[1] M.L.Littman, "Markov Games as a framework for Multi-agent
Reinforcement Learning", Proc. of Eleventh International Conference on
Machine Learning, Morgan Kaufman, pp. 157-163,1994.
[2] K. Zhou, J.C. Doyle and K. Glower, "Robust and Optimal Control",
Prentice Hall, New Jersey, 1996.
[3] M. D. S. Aliyu, "Adaptive Solution of Hamilton-Jacobi-Isaac Equation
and H∞ Stabilization of non- linear systems", Proceedings of the 2000
IEEE International Conference on Control Applications, Anchorage,
Alaska, USA September 25-27, pp 343-348, 2000.
[4] D. Michie and R.A. Chambers, "BOXES: An Experiment in Adaptive
Control", Machine Intelligence 2, Edinburgh, Oliver and Byod, pp. 137-
152, 1968.
[5] G. Strang, "Linear Algebra and its applications", Second Edition,
Academic Press, Orlando,Florida, 1980.
[6] D. Fudenberg and K. Levine, "The Theory of Learning in Games ",
MIT Press, 1998.
[7] L.C. Baird and H. Klopf, "Reinforcement Learning with High-
Dimensional Continuous Actions", Tech. Rep. WL-TR-93-1147, Wright
Laboratory, Wright-Patterson Air Force Base, OH 45433-7301.
[8] D.P.Bertsekas, and J.N. Tsitsiklis, "Neurodynamic Programming",
Athena Scientific, Belmont MA, 1996.
[9] E. Altman and A. Hordijk , " Zero-sum Markov games and worst-case
optimal control of queueing systems", Invited paper, QUESTA , Vol. 21,
special issue on optimization of queueing systems, pp. 415-447, 1995.
[10] K. Miyasawa, "On the convergence of learning process in 2x2 non zero
person game", Research memo 33, Princeton University, 1961.
[11] D. Fudenberg and K.D. Levine, " Consistency and Cautious Fictitious
Play", Journal of Economic Dynamics and Control, Elsevier Science ,
Volume 19, Issue 5-7, pp. 1065-1090, 1995.
[1] M.L.Littman, "Markov Games as a framework for Multi-agent
Reinforcement Learning", Proc. of Eleventh International Conference on
Machine Learning, Morgan Kaufman, pp. 157-163,1994.
[2] K. Zhou, J.C. Doyle and K. Glower, "Robust and Optimal Control",
Prentice Hall, New Jersey, 1996.
[3] M. D. S. Aliyu, "Adaptive Solution of Hamilton-Jacobi-Isaac Equation
and H∞ Stabilization of non- linear systems", Proceedings of the 2000
IEEE International Conference on Control Applications, Anchorage,
Alaska, USA September 25-27, pp 343-348, 2000.
[4] D. Michie and R.A. Chambers, "BOXES: An Experiment in Adaptive
Control", Machine Intelligence 2, Edinburgh, Oliver and Byod, pp. 137-
152, 1968.
[5] G. Strang, "Linear Algebra and its applications", Second Edition,
Academic Press, Orlando,Florida, 1980.
[6] D. Fudenberg and K. Levine, "The Theory of Learning in Games ",
MIT Press, 1998.
[7] L.C. Baird and H. Klopf, "Reinforcement Learning with High-
Dimensional Continuous Actions", Tech. Rep. WL-TR-93-1147, Wright
Laboratory, Wright-Patterson Air Force Base, OH 45433-7301.
[8] D.P.Bertsekas, and J.N. Tsitsiklis, "Neurodynamic Programming",
Athena Scientific, Belmont MA, 1996.
[9] E. Altman and A. Hordijk , " Zero-sum Markov games and worst-case
optimal control of queueing systems", Invited paper, QUESTA , Vol. 21,
special issue on optimization of queueing systems, pp. 415-447, 1995.
[10] K. Miyasawa, "On the convergence of learning process in 2x2 non zero
person game", Research memo 33, Princeton University, 1961.
[11] D. Fudenberg and K.D. Levine, " Consistency and Cautious Fictitious
Play", Journal of Economic Dynamics and Control, Elsevier Science ,
Volume 19, Issue 5-7, pp. 1065-1090, 1995.
@article{"International Journal of Information, Control and Computer Sciences:62225", author = "R. Sharma and M. Gopal", title = "Hybrid Markov Game Controller Design Algorithms for Nonlinear Systems", abstract = "Markov games can be effectively used to design
controllers for nonlinear systems. The paper presents two novel
controller design algorithms by incorporating ideas from gametheory
literature that address safety and consistency issues of the
'learned' control strategy. A more widely used approach for
controller design is the H∞ optimal control, which suffers from high
computational demand and at times, may be infeasible. We generate
an optimal control policy for the agent (controller) via a simple
Linear Program enabling the controller to learn about the unknown
environment. The controller is facing an unknown environment and
in our formulation this environment corresponds to the behavior rules
of the noise modeled as the opponent. Proposed approaches aim to
achieve 'safe-consistent' and 'safe-universally consistent' controller
behavior by hybridizing 'min-max', 'fictitious play' and 'cautious
fictitious play' approaches drawn from game theory. We empirically
evaluate the approaches on a simulated Inverted Pendulum swing-up
task and compare its performance against standard Q learning.", keywords = "Fictitious Play, Cautious Fictitious Play, InvertedPendulum, Controller, Markov Games, Mobile Robot.", volume = "1", number = "12", pages = "4034-5", }